Extraction Rules (Layer C)
Layer C describes how to parse and normalize what a scrape target returns. Together, Layers B + C make a packet L2. At L2+, at least one extraction rule is required.
ExtractionRule fields
| Field | Type | Notes |
|---|---|---|
id | string | Required |
rule_type | enum | Required — see below |
name · description | string | |
expression | string | The rule expression (jq filter, JSONPath, CSS selector, regex, …) |
llm_prompt | string | For llm_extract — use {content} for the input text |
output_contract_id | string | Id of the DataContract this rule outputs |
field_map | object<string,string> | For field_map — source_field → dest_field |
apply_to | string | JSONPath/field within the response to apply to |
fallback | any | Default value if extraction yields null |
post_process | enum[] | Post-processing pipeline — see below |
_extensions | object |
Rule types
jq_transform · jsonpath · css_selector · xpath · regex ·
llm_extract · python_fn · js_fn · field_mapPost-processing
post_process[] applies an ordered pipeline of normalizers:
trim · lowercase · uppercase · parse_int · parse_float ·
parse_date · strip_html · truncate_512Examples
json
{
"id": "rule-pubmed-ids",
"rule_type": "xpath",
"expression": "//IdList/Id/text()",
"apply_to": "$",
"post_process": ["trim"]
}json
{
"id": "rule-extract-findings",
"rule_type": "llm_extract",
"llm_prompt": "From the abstract below, extract the primary outcome and effect size as JSON. {content}",
"output_contract_id": "contract-finding"
}→ Next: Directives (Layer D) · Scrape Targets (Layer B)