Scrape Targets (Layer B)
Layer B describes where to acquire data. A packet that includes scrape_targets[] (plus extraction rules) reaches conformance level L2 — it's a reproducible recipe for gathering a class of knowledge. At L2+, at least one scrape target is required.
ScrapeTarget fields
| Field | Type | Notes |
|---|---|---|
id | string | Required |
target_type | enum | Required — see below |
url_template | string | Required — URL with {placeholder} variables |
name · description | string | |
method | enum GET/POST/PUT/PATCH | Default GET |
headers | object<string,string> | |
body_template | object | Request body for POST targets |
auth | AuthConfig | See Auth |
rate_limit | object | { requests_per_minute, burst } |
pagination | object | See Pagination |
cursor_field | string | Response field used as the ETL cursor |
response_type | enum | json/xml/html/csv/jsonl/binary |
extraction_rule_ids | string[] | ExtractionRule ids to apply |
schedule | string | Cron expression for recurring acquisition |
enabled | boolean | Default true |
Target types
rest_api · graphql · rss_feed · html_page · sitemap ·
oai_pmh · websocket · file_download · database_queryAuth
AuthConfig keeps secrets out of the packet — reference an environment variable, never a literal credential:
| Field | Notes |
|---|---|
type | none/api_key/bearer/basic/oauth2 (required) |
key_env | env var name holding the key (e.g. USDA_API_KEY) |
header_name | header to inject into (default Authorization) |
prefix | value prefix (e.g. Bearer , Key ) |
The registry publish gate scans for embedded secrets — a literal credential here is rejected. See Publish & sign.
Pagination
{ style, param_name, page_size, max_pages }, where style ∈ offset / cursor / page / link_header / none.
Example
json
{
"id": "tgt-pubmed",
"target_type": "rest_api",
"url_template": "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term={query}&retmax={limit}",
"method": "GET",
"response_type": "xml",
"pagination": { "style": "offset", "param_name": "retstart", "page_size": 20 },
"extraction_rule_ids": ["rule-pubmed-ids"]
}→ Next: Extraction Rules (Layer C) · Source Fields (Layer A)