adv-web-scraper-api

Template API Documentation

The Template API provides endpoints to discover and retrieve pre-defined configuration templates stored within the config-templates directory. These templates serve as examples and starting points for various scraping challenges.

Templates are organized by site and challenge: config-templates/<site_name>/challenges/<challenge_name>/. Each challenge directory is expected to contain:

Endpoints

List Templates

Retrieves a list of available configuration templates, optionally filtered by site or tag.

Get Single Template Details

Retrieves the full details for a specific template, including the content of its config.json file.

Metadata Schema (README.md Front Matter)

The YAML front matter in each challenge’s README.md should adhere to the following structure:

---
title: string (required) - Human-readable title for the challenge.
path: string (required) - The filename of the config JSON (e.g., "config.json"). Must end with .json.
description: string (required) - A brief description of the challenge or template purpose.
tags: string[] (required) - An array of relevant tags (e.g., ["Login", "Pagination", "JavaScript"]). Must have at least one tag.
difficulty: string (optional) - Estimated difficulty ('Beginner', 'Intermediate', 'Advanced', 'Expert').
related_steps: string[] (optional) - Array of relevant step types used in the config (e.g., ["goto", "click", "extract"]).
---
Markdown content explaining the challenge and the template follows here...

The TemplateService validates this metadata using a Zod schema. Templates with invalid or missing required metadata will be skipped and logged during the scan.