Skip to content

Scrape Targets (Layer B)

Layer B describes where to acquire data. A packet that includes scrape_targets[] (plus extraction rules) reaches conformance level L2 — it's a reproducible recipe for gathering a class of knowledge. At L2+, at least one scrape target is required.

ScrapeTarget fields

FieldTypeNotes
idstringRequired
target_typeenumRequired — see below
url_templatestringRequired — URL with {placeholder} variables
name · descriptionstring
methodenum GET/POST/PUT/PATCHDefault GET
headersobject<string,string>
body_templateobjectRequest body for POST targets
authAuthConfigSee Auth
rate_limitobject{ requests_per_minute, burst }
paginationobjectSee Pagination
cursor_fieldstringResponse field used as the ETL cursor
response_typeenumjson/xml/html/csv/jsonl/binary
extraction_rule_idsstring[]ExtractionRule ids to apply
schedulestringCron expression for recurring acquisition
enabledbooleanDefault true

Target types

rest_api · graphql · rss_feed · html_page · sitemap ·
oai_pmh · websocket · file_download · database_query

Auth

AuthConfig keeps secrets out of the packet — reference an environment variable, never a literal credential:

FieldNotes
typenone/api_key/bearer/basic/oauth2 (required)
key_envenv var name holding the key (e.g. USDA_API_KEY)
header_nameheader to inject into (default Authorization)
prefixvalue prefix (e.g. Bearer , Key )

The registry publish gate scans for embedded secrets — a literal credential here is rejected. See Publish & sign.

Pagination

{ style, param_name, page_size, max_pages }, where styleoffset / cursor / page / link_header / none.

Example

json
{
  "id": "tgt-pubmed",
  "target_type": "rest_api",
  "url_template": "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term={query}&retmax={limit}",
  "method": "GET",
  "response_type": "xml",
  "pagination": { "style": "offset", "param_name": "retstart", "page_size": 20 },
  "extraction_rule_ids": ["rule-pubmed-ids"]
}

→ Next: Extraction Rules (Layer C) · Source Fields (Layer A)

Released under the MIT License.