Extraction Rules (Layer C)

Layer C describes how to parse and normalize what a scrape target returns. Together, Layers B + C make a packet L2. At L2+, at least one extraction rule is required.

ExtractionRule fields

Field	Type	Notes
`id`	string	Required
`rule_type`	enum	Required — see below
`name` · `description`	string
`expression`	string	The rule expression (jq filter, JSONPath, CSS selector, regex, …)
`llm_prompt`	string	For `llm_extract` — use `{content}` for the input text
`output_contract_id`	string	Id of the DataContract this rule outputs
`field_map`	object<string,string>	For `field_map` — `source_field → dest_field`
`apply_to`	string	JSONPath/field within the response to apply to
`fallback`	any	Default value if extraction yields null
`post_process`	enum[]	Post-processing pipeline — see below
`_extensions`	object

Rule types

jq_transform · jsonpath · css_selector · xpath · regex ·
llm_extract · python_fn · js_fn · field_map

Post-processing

post_process[] applies an ordered pipeline of normalizers:

trim · lowercase · uppercase · parse_int · parse_float ·
parse_date · strip_html · truncate_512

Examples

json

{
  "id": "rule-pubmed-ids",
  "rule_type": "xpath",
  "expression": "//IdList/Id/text()",
  "apply_to": "$",
  "post_process": ["trim"]
}

json

{
  "id": "rule-extract-findings",
  "rule_type": "llm_extract",
  "llm_prompt": "From the abstract below, extract the primary outcome and effect size as JSON. {content}",
  "output_contract_id": "contract-finding"
}

→ Next: Directives (Layer D) · Scrape Targets (Layer B)

Extraction Rules (Layer C) ​

ExtractionRule fields ​

Rule types ​

Post-processing ​

Examples ​

Extraction Rules (Layer C)

ExtractionRule fields

Rule types

Post-processing

Examples