Schemas
Define schemas for consistent data extraction
Schemas define the structure of the data you want to extract. Well-designed schemas produce better, more consistent results.
Schema Basics
Schemas can be defined in YAML (recommended) or JSON. YAML is preferred because it supports comments, making schemas self-documenting:
# Product extraction schema
title: string # The main product title
price: number # Price in the page's currency
available: boolean # Whether the item is in stockThe equivalent JSON:
{
"title": "string",
"price": "number",
"available": "boolean"
}Type Reference
Primitives
name: string # Text values
count: number # Numeric values (integers or decimals)
active: boolean # true/false valuesArrays
# Simple arrays
tags: [string] # Array of strings
prices: [number] # Array of numbers
# Array of objects
items:
- name: string # Item name
qty: number # Quantity in stockNested Objects
product:
name: string # Product display name
brand:
name: string # Brand name
country: string # Country of originBest Practices
Be Specific with Field Names
Use descriptive field names - they guide the LLM's extraction:
# Good - descriptive names guide extraction
product_name: string # The main product title
price_usd: number # Price in US dollars
stock_quantity: number # Number of units available
# Less effective - generic names
name: string
price: number
qty: numberUse Comments to Clarify Intent
Comments help the LLM understand exactly what you want:
# E-commerce product schema
title: string # The main product heading, not brand name
price: number # Current sale price, not original/RRP
rating: number # Average rating as a decimal (e.g., 4.5)
rating_text: string # Full rating text (e.g., "4.5 out of 5 stars")
review_count: number # Total number of reviews as integerMatch Types to Expected Data
Choose types that match the actual data format:
rating: number # 4.5 - when you need the numeric value
rating_text: string # "4.5 out of 5" - when you need the full text
review_count: number # 1234 - numeric count
price: number # 29.99 - for calculations
price_text: string # "$29.99" - preserves currency symbolHandle Missing Data
The LLM will return null for fields it cannot find. Design schemas to handle this gracefully.
Schema Catalog
Save and reuse schemas via the API:
# Create a reusable schema
curl -X POST https://api.refyne.uk/api/v1/schemas \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "E-commerce Product",
"schema_yaml": "# Product details\nname: string # Product title\nprice: number # Current price\ndescription: string"
}'Complete Example
A comprehensive e-commerce schema with comments:
# E-commerce product extraction schema
# Use this for extracting product data from online stores
product:
name: string # Main product title
brand: string # Manufacturer or brand name
sku: string # Product SKU or model number
pricing:
current: number # Current/sale price
original: number # Original price before discount
currency: string # Currency code (USD, GBP, EUR)
availability:
in_stock: boolean # Whether item can be purchased
quantity: number # Stock count if displayed
shipping: string # Shipping information
details:
description: string # Full product description
specifications:
- key: string # Spec name (e.g., "Weight")
value: string # Spec value (e.g., "2.5 kg")
reviews:
average_rating: number # Rating out of 5
review_count: number # Total number of reviews
recent:
- author: string # Reviewer name
rating: number # Individual rating
text: string # Review content
date: string # Review date