The Advanced Web Scraper API provides robust regex extraction capabilities for extracting structured data from HTML content.
Regex selectors can be used in two ways:
{
"selector": "/£([0-9,]+)/",
"type": "regex"
}
{
"selector": "address.propertyCard-address",
"type": "regex",
"pattern": "/\\b([A-Z]{1,2}\\d[A-Z\\d]?\\s\\d[A-Z]{2})\\b/"
}
| Parameter | Type | Description | Example |
|---|---|---|---|
selector |
string | Regex pattern (with slashes) or CSS/XPath selector | "/£([0-9,]+)/" |
type |
string | Must be "regex" |
"regex" |
pattern |
string | Alternative to selector - regex pattern without slashes |
"\\b([A-Z]{1,2}\\d[A-Z\\d]?\\s\\d[A-Z]{2})\\b" |
flags |
string | Regex flags (g, i, m, s) | "gi" |
group |
number | Capture group to extract (0 = full match) | 1 |
source |
string | Source content: "html", "text" or CSS/XPath selector |
"html" |
multiple |
boolean | Return all matches (array) or first match | true |
{
"price": {
"selector": "/£([0-9,]+)/",
"type": "regex",
"dataType": "number",
"transform": "value.replace(/,/g, '')"
}
}
{
"postcode": {
"selector": "address.propertyCard-address",
"type": "regex",
"pattern": "/\\b([A-Z]{1,2}\\d[A-Z\\d]?\\s\\d[A-Z]{2})\\b/",
"multiple": true
}
}
{
"url": {
"selector": "/https?:\\/\\/[\\w.-]+\\.[a-z]{2,}\\/[^\\s\"]+/gi",
"type": "regex",
"multiple": true
}
}
{
"phone": {
"selector": "/(?:\\+44|0)\\s?\\d{2,4}\\s?\\d{3,4}\\s?\\d{3,4}/",
"type": "regex",
"multiple": true
}
}
{
"number": {
"selector": ".content",
"type": "regex",
"pattern": "/\\d+/",
"multiple": true
}
}
| Description | Pattern |
|---|---|
| UK Postcode | /\b([A-Z]{1,2}\d[A-Z\d]?\s\d[A-Z]{2})\b/ |
| UK Phone | /(?:\+44|0)\s?\d{2,4}\s?\d{3,4}\s?\d{3,4}/ |
/[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/ |
|
| Price | /£([\d,]+\.\d{2})/ |
| Date (DD/MM/YYYY) | /\b(0[1-9]|[12][0-9]|3[01])\/(0[1-9]|1[012])\/(19|20)\d{2}\b/ |
| URL | /https?:\/\/[^\s]+/ |
| Number | /[\d,]+/ |
| Text between tags | /<tag>(.*?)<\/tag>/ |
| IP Address | /\b(?:\d{1,3}\.){3}\d{1,3}\b/ |
.*) when possibleFor more examples, see the Rightmove config example.