Scraper API Documentation

Payload limit: 100kb • Generation rate limit: 5 req / 60s

Overview

This service queues article generation jobs and processes them using a ChatGPT-based scraper.

Responses are JSON and incoming payloads are sanitized to remove scripts and dangerous HTML fragments.

Endpoints

POST /api/articles/generate

Public endpoint that queues generation and optionally notifies your webhook when done.

curl -X POST http://localhost:3000/api/articles/generate \
  -H "Content-Type: application/json" \
  -d '{"topic":"My topic","webhookUrl":"https://example.com/webhook","webhookSecret":"s3cr3t"}'

Webhook receives a JSON POST: { success: true, jobId, data } and a header X-Webhook-Signature when a secret is provided.

POST /api/webhook/test

Test endpoint that accepts webhook POSTs and stores them to /logs/webhooks/*.json. If you provide a secret (either as query ?secret=... or header X-Webhook-Secret), include header X-Webhook-Signature with HMAC-SHA256 hex of the JSON body to verify signature.

curl -X POST http://localhost:3000/api/webhook/test \
  -H "Content-Type: application/json" \
  -H "X-Webhook-Secret: s3cr3t" \
  -H "X-Webhook-Signature: $(echo -n '{"jobId":"1"}' | openssl dgst -sha256 -hmac "s3cr3t" | sed 's/^.*= //')" \
  -d '{"jobId":"1","result":"ok"}'

GET /api/articles/queue/status

Queue status (messageCount, consumerCount).

Security Notes

Sanitization: All incoming string fields are sanitized to remove obvious XSS vectors (scripts, inline event handlers, javascript: URIs).

IP Whitelist & Blacklist

The public generation endpoint /api/articles/generate-public is protected by an IP access control list.

Behavior when hitting protected routes:

Postman collection for testing: Download collection JSON and import into Postman. Use the included environment Scraper Local (file: Scraper.environment.json).

Support

Jika ada pertanyaan mengenai scraper, silakan kirim email ke: misbakhul2904@gmail.com atau lewat Facebook: facebook.com/Misbakhul29