The Storage module provides a flexible and extensible system for storing, retrieving, updating, and deleting extraction results. It follows a modular design with a common interface that allows for easy switching between different storage implementations.
The Storage module follows the Strategy pattern, allowing the application to switch between different storage implementations without changing the client code. The module consists of the following components:

The Memory Storage adapter stores extraction results in memory using a Map. This is useful for development and testing, but data is lost when the server restarts.
Key Features:
Usage:
import { MemoryStorageAdapter } from '../../storage/index.js';
const memoryStorage = new MemoryStorageAdapter();
await memoryStorage.initialize();
// Store an extraction result
const id = await memoryStorage.store(extractionResult);
// Retrieve an extraction result
const result = await memoryStorage.retrieve(id);
// Update an extraction result
const success = await memoryStorage.update(id, { status: 'updated' });
// Delete an extraction result
const deleted = await memoryStorage.delete(id);
// List extraction results
const results = await memoryStorage.list({ limit: 10, offset: 0 });
The File Storage adapter stores extraction results as JSON files in the file system. This provides persistence between server restarts and is suitable for small to medium-sized datasets.
Key Features:
Usage:
import { FileStorageAdapter } from '../../storage/index.js';
const fileStorage = new FileStorageAdapter({
directory: './data/extraction-results',
fileExtension: '.json',
prettyPrint: true,
});
await fileStorage.initialize();
// Store an extraction result
const id = await fileStorage.store(extractionResult);
// Retrieve an extraction result
const result = await fileStorage.retrieve(id);
// Update an extraction result
const success = await fileStorage.update(id, { status: 'updated' });
// Delete an extraction result
const deleted = await fileStorage.delete(id);
// List extraction results
const results = await fileStorage.list({ limit: 10, offset: 0 });
The MongoDB Storage adapter stores extraction results in a MongoDB database. This provides scalable and queryable storage suitable for large datasets.
Key Features:
Usage:
import { MongoDBStorageAdapter } from '../../storage/index.js';
const mongodbStorage = new MongoDBStorageAdapter({
uri: 'mongodb://localhost:27017/web-scraper',
database: 'web-scraper',
collection: 'extraction-results',
});
await mongodbStorage.initialize();
// Store an extraction result
const id = await mongodbStorage.store(extractionResult);
// Retrieve an extraction result
const result = await mongodbStorage.retrieve(id);
// Update an extraction result
const success = await mongodbStorage.update(id, { status: 'updated' });
// Delete an extraction result
const deleted = await mongodbStorage.delete(id);
// List extraction results
const results = await mongodbStorage.list({ limit: 10, offset: 0 });
The Redis Storage adapter stores extraction results in Redis. This provides high-performance caching and storage with optional expiration.
Key Features:
Usage:
import { RedisStorageAdapter } from '../../storage/index.js';
const redisStorage = new RedisStorageAdapter({
host: 'localhost',
port: 6379,
keyPrefix: 'extraction:',
expireTime: 3600, // 1 hour
});
await redisStorage.initialize();
// Store an extraction result
const id = await redisStorage.store(extractionResult);
// Retrieve an extraction result
const result = await redisStorage.retrieve(id);
// Update an extraction result
const success = await redisStorage.update(id, { status: 'updated' });
// Delete an extraction result
const deleted = await redisStorage.delete(id);
// List extraction results
const results = await redisStorage.list({ limit: 10, offset: 0 });
The API Storage adapter stores extraction results in an external API. This provides integration with external systems and services.
Key Features:
Usage:
import { ApiStorageAdapter } from '../../storage/index.js';
const apiStorage = new ApiStorageAdapter({
baseUrl: 'https://api.example.com',
endpoints: {
store: '/store',
retrieve: '/retrieve',
update: '/update',
delete: '/delete',
list: '/list',
},
auth: {
type: 'bearer',
token: 'your-api-token',
},
retry: {
maxRetries: 3,
retryDelay: 1000,
},
});
await apiStorage.initialize();
// Store an extraction result
const id = await apiStorage.store(extractionResult);
// Retrieve an extraction result
const result = await apiStorage.retrieve(id);
// Update an extraction result
const success = await apiStorage.update(id, { status: 'updated' });
// Delete an extraction result
const deleted = await apiStorage.delete(id);
// List extraction results
const results = await apiStorage.list({ limit: 10, offset: 0 });
The Storage Factory creates and manages storage adapters, following the Factory and Singleton patterns. It provides a centralized way to create and reuse storage adapters.
Key Features:
Usage:
import { StorageFactory } from '../../storage/index.js';
// Get the storage factory instance
const storageFactory = StorageFactory.getInstance();
// Get a memory storage adapter
const memoryStorage = await storageFactory.getAdapter('memory');
// Get a file storage adapter with options
const fileStorage = await storageFactory.getAdapter('file', {
directory: './data/extraction-results',
prettyPrint: true,
});
// Get a MongoDB storage adapter with options
const mongodbStorage = await storageFactory.getAdapter('mongodb', {
uri: 'mongodb://localhost:27017/web-scraper',
});
// Close all adapters when done
await storageFactory.closeAll();
The Storage Service provides a unified interface for storage operations, with support for primary and backup storage. It follows the Singleton pattern and provides a high-level API for storage operations.
Key Features:
Usage:
import { StorageService } from '../../storage/index.js';
// Get the storage service instance with options
const storageService = StorageService.getInstance({
primaryAdapter: 'mongodb',
primaryAdapterOptions: {
uri: 'mongodb://localhost:27017/web-scraper',
},
backupAdapter: 'file',
backupAdapterOptions: {
directory: './data/backup',
},
useBackup: true,
});
// Initialize the storage service
await storageService.initialize();
// Store an extraction result
const id = await storageService.store(extractionResult);
// Retrieve an extraction result
const result = await storageService.retrieve(id);
// Update an extraction result
const success = await storageService.update(id, { status: 'updated' });
// Delete an extraction result
const deleted = await storageService.delete(id);
// List extraction results
const results = await storageService.list({ limit: 10, offset: 0 });
// Change the primary storage adapter
await storageService.changePrimaryAdapter('redis', {
host: 'localhost',
port: 6379,
});
// Change the backup storage adapter
await storageService.changeBackupAdapter('api', {
baseUrl: 'https://api.example.com',
});
// Close the storage service when done
await storageService.close();
import { StorageService } from '../../storage/index.js';
// Get the storage service instance
const storageService = StorageService.getInstance();
// Initialize the storage service
await storageService.initialize();
// Store an extraction result
const extractionResult = {
id: 'extract_123',
url: 'https://example.com',
status: 'completed',
data: { title: 'Example Page' },
timestamp: new Date().toISOString(),
};
const id = await storageService.store(extractionResult);
// Retrieve an extraction result
const result = await storageService.retrieve(id);
console.log('Retrieved result:', result);
// Update an extraction result
const success = await storageService.update(id, { status: 'updated' });
console.log('Update success:', success);
// List extraction results
const results = await storageService.list({ limit: 10, offset: 0 });
console.log('Listed results:', results);
// Delete an extraction result
const deleted = await storageService.delete(id);
console.log('Delete success:', deleted);
// Close the storage service when done
await storageService.close();
import { StorageService } from '../../storage/index.js';
// Get the storage service instance with primary and backup adapters
const storageService = StorageService.getInstance({
primaryAdapter: 'mongodb',
primaryAdapterOptions: {
uri: 'mongodb://localhost:27017/web-scraper',
},
backupAdapter: 'file',
backupAdapterOptions: {
directory: './data/backup',
},
useBackup: true,
});
// Initialize the storage service
await storageService.initialize();
// Store an extraction result (stored in both primary and backup)
const id = await storageService.store(extractionResult);
// Retrieve an extraction result (tries primary first, then backup)
const result = await storageService.retrieve(id);
// Update an extraction result (updates both primary and backup)
const success = await storageService.update(id, { status: 'updated' });
// Delete an extraction result (deletes from both primary and backup)
const deleted = await storageService.delete(id);
// Close the storage service when done
await storageService.close();
import { StorageService } from '../../storage/index.js';
// Get the storage service instance with memory adapter
const storageService = StorageService.getInstance({
primaryAdapter: 'memory',
});
// Initialize the storage service
await storageService.initialize();
// Store some extraction results
await storageService.store(extractionResult1);
await storageService.store(extractionResult2);
// Change the primary storage adapter to MongoDB
// This will automatically migrate all data from memory to MongoDB
await storageService.changePrimaryAdapter('mongodb', {
uri: 'mongodb://localhost:27017/web-scraper',
});
// Now all operations will use MongoDB
const results = await storageService.list();
console.log('Results from MongoDB:', results);
// Close the storage service when done
await storageService.close();
interface MemoryStorageOptions {
// No specific options for memory storage
}
interface FileStorageOptions {
directory?: string; // Default: './data/extraction-results'
fileExtension?: string; // Default: '.json'
createDirectory?: boolean; // Default: true
prettyPrint?: boolean; // Default: true
}
interface MongoDBStorageOptions {
uri?: string; // Default: config.mongodb.uri
database?: string; // Default: 'web-scraper'
collection?: string; // Default: 'extraction-results'
options?: any; // MongoDB connection options
}
interface RedisStorageOptions {
host?: string; // Default: config.redis.host
port?: number; // Default: config.redis.port
password?: string;
db?: number; // Default: 0
keyPrefix?: string; // Default: 'extraction:'
expireTime?: number; // Default: 0 (no expiration)
}
interface ApiStorageOptions {
baseUrl: string;
endpoints?: {
store?: string; // Default: '/store'
retrieve?: string; // Default: '/retrieve'
update?: string; // Default: '/update'
delete?: string; // Default: '/delete'
list?: string; // Default: '/list'
};
auth?: {
type: 'basic' | 'bearer' | 'api-key';
username?: string; // For basic auth
password?: string; // For basic auth
token?: string; // For bearer auth
apiKeyName?: string; // For API key auth
apiKeyValue?: string; // For API key auth
apiKeyLocation?: 'header' | 'query'; // Default: 'header'
};
timeout?: number; // Default: 30000
headers?: Record<string, string>;
retry?: {
maxRetries?: number; // Default: 3
retryDelay?: number; // Default: 1000
};
}
interface StorageFactoryOptions {
defaultAdapter?: 'memory' | 'file' | 'mongodb' | 'redis' | 'api'; // Default: 'memory'
adapterOptions?: {
memory?: MemoryStorageOptions;
file?: FileStorageOptions;
mongodb?: MongoDBStorageOptions;
redis?: RedisStorageOptions;
api?: ApiStorageOptions;
};
}
interface StorageServiceOptions {
primaryAdapter?: 'memory' | 'file' | 'mongodb' | 'redis' | 'api'; // Default: 'memory'
primaryAdapterOptions?: StorageAdapterOptions;
backupAdapter?: 'memory' | 'file' | 'mongodb' | 'redis' | 'api';
backupAdapterOptions?: StorageAdapterOptions;
useBackup?: boolean; // Default: false
}