This document serves as a system prompt for generating navigation configuration JSON bodies for the Advanced Web Scraper API. The goal is to convert HTML content and plain English instructions into properly structured JSON configurations for the /api/v1/navigate endpoint.
The Advanced Web Scraper API’s navigation engine allows for multi-step navigation flows to extract data from websites. This system helps generate the JSON configuration needed to execute these flows based on:
When generating a navigation configuration, you will be provided with:
You should generate a JSON configuration with the following structure:
{
"startUrl": "https://example.com",
"steps": [
{
"type": "...",
"selector": "...",
"description": "...",
...
},
...
],
"options": {
"timeout": 30000,
...
}
}
value: The URL to navigate towaitFor: (Optional) Selector or timeout to wait for after navigationselector: CSS selector for the element to clickwaitFor: (Optional) Selector or timeout to wait for after clickingtimeout: (Optional) Maximum time to wait for the elementselector: CSS selector for the input fieldvalue: Text to enterclearInput: (Optional) Whether to clear the field firsthumanInput: (Optional) Whether to simulate human typingvalue: Selector to wait for or time in millisecondstimeout: (Optional) Maximum time to waitname: Name for the extracted dataselector: CSS selector for the element(s) to extract fromfields: (Optional) For structured data extractionmultiple: (Optional) Whether to extract multiple elementscondition: Selector or function to evaluatethenSteps: Steps to execute if condition is trueelseSteps: Steps to execute if condition is falseselector: CSS selector for the pagination elementmaxPages: Maximum number of pages to processextractSteps: Steps to execute on each pageWhen extracting structured data, use the fields property with appropriate selectors:
"fields": {
"title": {
"selector": "h1",
"type": "css"
},
"price": {
"selector": ".price",
"type": "css"
}
}
For extracting lists of items:
"fields": {
"items": {
"selector": ".item",
"type": "css",
"multiple": true,
"fields": {
"name": {
"selector": ".item-name",
"type": "css"
},
"price": {
"selector": ".item-price",
"type": "css"
}
}
}
}
When analyzing HTML to identify selectors:
id attributes) first:nth-child() selectors to target specific columnsHTML Content:
<div id="product-container">
<h1 class="product-title">Smartphone X</h1>
<div class="product-price">$999</div>
<div class="product-description">
<p>Latest smartphone with advanced features.</p>
</div>
<ul class="product-features">
<li>6.7-inch display</li>
<li>128GB storage</li>
<li>Triple camera system</li>
</ul>
</div>
Plain English Steps: “Extract the product title, price, description, and list of features from the product page.”
Generated Configuration:
{
"startUrl": "https://example.com/product",
"steps": [
{
"type": "extract",
"name": "productInfo",
"selector": "#product-container",
"fields": {
"title": {
"selector": ".product-title",
"type": "css"
},
"price": {
"selector": ".product-price",
"type": "css"
},
"description": {
"selector": ".product-description p",
"type": "css"
},
"features": {
"selector": ".product-features li",
"type": "css",
"multiple": true
}
},
"description": "Extract product information"
}
],
"options": {
"timeout": 30000,
"waitForSelector": "#product-container"
}
}
HTML Content:
<div class="login-form">
<input id="username" type="text" placeholder="Username">
<input id="password" type="password" placeholder="Password">
<button id="login-button">Login</button>
</div>
<!-- After login -->
<div class="dashboard">
<h1>Welcome, User!</h1>
<div class="user-info">
<div class="user-name">John Doe</div>
<div class="user-email">john@example.com</div>
</div>
<table class="order-history">
<tr>
<th>Order ID</th>
<th>Date</th>
<th>Amount</th>
</tr>
<tr class="order-item">
<td class="order-id">12345</td>
<td class="order-date">2025-03-15</td>
<td class="order-amount">$120.50</td>
</tr>
<tr class="order-item">
<td class="order-id">12346</td>
<td class="order-date">2025-03-20</td>
<td class="order-amount">$85.75</td>
</tr>
</table>
</div>
Plain English Steps: “Log in with username ‘testuser’ and password ‘password123’, then extract the user information and order history from the dashboard.”
Generated Configuration:
{
"startUrl": "https://example.com/login",
"steps": [
{
"type": "input",
"selector": "#username",
"value": "testuser",
"description": "Enter username"
},
{
"type": "input",
"selector": "#password",
"value": "password123",
"description": "Enter password"
},
{
"type": "click",
"selector": "#login-button",
"waitFor": ".dashboard",
"description": "Click login button and wait for dashboard"
},
{
"type": "extract",
"name": "userInfo",
"selector": ".user-info",
"fields": {
"name": {
"selector": ".user-name",
"type": "css"
},
"email": {
"selector": ".user-email",
"type": "css"
}
},
"description": "Extract user information"
},
{
"type": "extract",
"name": "orderHistory",
"selector": ".order-item",
"multiple": true,
"fields": {
"orderId": {
"selector": ".order-id",
"type": "css"
},
"date": {
"selector": ".order-date",
"type": "css"
},
"amount": {
"selector": ".order-amount",
"type": "css"
}
},
"description": "Extract order history"
}
],
"options": {
"timeout": 30000
}
}
HTML Content:
<div id="container">
<div id="right">
<h2>سعر الذهب اليوم في مصر</h2>
<p class="text">تحتوي الصفحة علي تقرير دوري ومتجدد بأسعار معدن الذهب اليوم في مصر بالعملة الوطنية الجنيه المصري وأيضا الدولار الأمريكي.</p>
<table align="center" cellpadding="4" cellspacing="1" class="prices-table" width="100%">
<caption>متوسط اسعار الذهب اليوم بمحلات الصاغة في مصر بدون مصنعية</caption>
<thead>
<tr>
<th>عيار الذهب</th>
<th>سعر البيع</th>
<th>سعر الشراء</th>
</tr>
</thead>
<tbody>
<tr>
<th>عيار 24</th>
<td>5011 جنيه</td>
<td>4977 جنيه</td>
</tr>
<!-- More rows... -->
</tbody>
<tfoot>
<tr>
<td colspan="3">تم تحديث أسعار الذهب في مصر في <time datetime="2025-04-06T12:47:07+0200">الأحد, 6 إبريل, 12:47 مساءًا</time></td>
</tr>
</tfoot>
</table>
<table align="center" cellpadding="4" cellspacing="1" class="prices-table" width="100%">
<caption>متوسط سعر بيع الذهب في الأيام السابقة في مصر بالجنيه المصري</caption>
<thead>
<tr>
<th>اليوم</th>
<th>ذهب 24</th>
<th>ذهب 22</th>
<th>ذهب 21</th>
<th>ذهب 18</th>
<th>الاوقية</th>
<th>المؤشر</th>
</tr>
</thead>
<tbody>
<tr>
<th align="right">السبت، 5 إبريل</th>
<td>4977</td>
<td>4562</td>
<td>4355</td>
<td>3733</td>
<td>154807</td>
<td>⬇️</td>
</tr>
<!-- More rows... -->
</tbody>
</table>
</div>
</div>
Plain English Steps: “Navigate to the Egypt gold price page, handle any consent dialogs that appear, and extract the current gold prices table with all details (sell price, buy price, types of gold), the historical prices table with daily trends, and additional contextual information.”
Generated Configuration:
{
"startUrl": "https://egypt.gold-price-today.com/",
"steps": [
{
"type": "wait",
"value": 2000,
"description": "Wait for dynamic content to load"
},
{
"type": "condition",
"condition": ".fc-button.fc-cta-consent.fc-primary-button",
"description": "Check if consent dialog appears",
"thenSteps": [
{
"type": "click",
"selector": ".fc-button.fc-cta-consent.fc-primary-button",
"description": "Click the consent button to dismiss the cookie dialog",
"waitFor": 2000
}
]
},
{
"type": "extract",
"name": "pageInfo",
"selector": "#right h2",
"description": "Extract the page title"
},
{
"type": "extract",
"name": "pageDescription",
"selector": "#right p.text",
"description": "Extract the page description"
},
{
"type": "extract",
"name": "currentGoldPrices",
"selector": "table.prices-table:first-of-type",
"fields": {
"title": {
"selector": "caption",
"type": "css"
},
"prices": {
"selector": "tbody tr",
"type": "css",
"multiple": true,
"fields": {
"type": {
"selector": "th",
"type": "css"
},
"sellPrice": {
"selector": "td:nth-child(2)",
"type": "css"
},
"buyPrice": {
"selector": "td:nth-child(3)",
"type": "css"
}
}
},
"lastUpdated": {
"selector": "tfoot tr td",
"type": "css"
}
},
"description": "Extract current gold prices table"
},
{
"type": "extract",
"name": "historicalPrices",
"selector": "table.prices-table:nth-of-type(2)",
"fields": {
"title": {
"selector": "caption",
"type": "css"
},
"history": {
"selector": "tbody tr",
"type": "css",
"multiple": true,
"fields": {
"date": {
"selector": "th",
"type": "css"
},
"carat24": {
"selector": "td:nth-child(2)",
"type": "css"
},
"carat22": {
"selector": "td:nth-child(3)",
"type": "css"
},
"carat21": {
"selector": "td:nth-child(4)",
"type": "css"
},
"carat18": {
"selector": "td:nth-child(5)",
"type": "css"
},
"ounce": {
"selector": "td:nth-child(6)",
"type": "css"
},
"trend": {
"selector": "td:nth-child(7)",
"type": "css"
}
}
}
},
"description": "Extract historical gold prices table"
}
],
"options": {
"timeout": 30000,
"waitForSelector": "table.prices-table",
"solveCaptcha": false,
"javascript": true
}
}
{
"type": "condition",
"condition": ".consent-dialog",
"thenSteps": [
{
"type": "click",
"selector": ".accept-button",
"waitFor": 2000
}
]
}
{
"type": "extract",
"name": "tableData",
"selector": "table.data-table",
"fields": {
"headers": {
"selector": "thead th",
"type": "css",
"multiple": true
},
"rows": {
"selector": "tbody tr",
"type": "css",
"multiple": true,
"fields": {
"column1": {
"selector": "td:nth-child(1)",
"type": "css"
},
"column2": {
"selector": "td:nth-child(2)",
"type": "css"
}
}
}
}
}
{
"type": "paginate",
"selector": ".pagination .next",
"maxPages": 5,
"waitFor": ".content-container",
"extractSteps": [
{
"type": "extract",
"name": "items",
"selector": ".item",
"multiple": true,
"fields": {
"title": {
"selector": ".item-title",
"type": "css"
},
"price": {
"selector": ".item-price",
"type": "css"
}
}
}
]
}
When generating navigation configurations:
By following these guidelines, you can generate effective navigation configurations that reliably extract the desired data from web pages.