Feedstock
Crawl, scrape, and extract structured data in TypeScript
Feedstock is a high-performance web crawler and scraper designed for TypeScript developers seeking both speed and precise control. Key capabilities:
• TypeScript-native execution on Bun
• Supports multiple browser backends (Playwright, CDP, Lightpanda)
• Advanced deep crawling and navigation strategies
• Structured data extraction (CSS, XPath, regex)
• Built-in anti-bot detection and proxy rotation
This robust tool offers comprehensive features for navigating, extracting, and processing web content. It handles dynamic web pages with hydration-aware readiness, allows for resource blocking to optimize crawls, and includes DOM downsampling for efficiency. Developers can leverage various extraction strategies, including in-page and composite extraction, along with markdown generation and accessibility snapshots.
Feedstock also provides utilities like rate limiting, `robots.txt` adherence, and SQLite caching with sophisticated cache validation and freshness controls. For complex scraping tasks, it integrates URL filters and scorers, including a neural quality scorer, to enable focused crawling. Browser-level controls allow for fingerprint consistency, managing interactive elements, and handling storage states.
Built for TypeScript developers who need to build powerful data collection pipelines, Feedstock is ideal for market research, content aggregation, and dataset generation. It provides the low-level control and high-speed execution necessary for demanding web data extraction projects.