Projects

LangShake vs Traditional Crawling: 20x Faster, 90% Less CPU/Bandwidth, Verified AI Content

image
May 16, 2025
Crawling the web today is noisy, inefficient, and expensive for everyone involved. 🔹 Webmasters often face overloaded servers, maxed-out bandwidth, and even unexpected bills from aggressive bots scraping their sites 24/7. 🔹 Crawlers and AI agents waste time and compute parsing bloated HTML, dynamic content, or pages with missing or malformed schema. LangShake flips this model. It gives webmasters full control over what structured data is exposed - in a clean, verifiable format - while dramatically reducing resource usage. In our benchmark, LangShake consumed:
  • ~93% less CPU
  • ~96% less bandwidth
  • ~7% less RAM
By exposing only what's needed, in a format built for machines, LangShake allows AI agents to extract structured content with ~90% fewer resources. The result? ✅ Fewer headaches and lower bills for webmasters ✅ Faster, cleaner, more trustworthy data for crawlers LangShake is a win-win: you control the signal, and machines skip the noise. Learn more in the whitepaper. We benchmarked xevi.work, the first langshake ready site, with 8 pages using both LangShake and traditional crawling (Selenium + Cheerio). Here's the result:
MetricLangShakeTraditionalSavings
Total Duration1.57s36.41s~23.2× faster
CPU Time46.2 ms621.9 ms~93% less CPU
Peak Memory91.1 MB97.8 MB~7% less RAM
Data Downloaded25.3 KB655.9 KB~96% less bandwidth
Avg Request Time689 ms11.77s~94% faster per request
Download RPS5.73 req/s0.22 req/s~26× faster throughput
Errors00✅ Stable on both sides
Schema Accuracy✅✅✅ Equivalent
Merkle Root Match✅✅✅ Trusted at all levels
If you run a 1,000-page site, the difference in time, CPU, and bandwidth is not linear; it compounds. Here's an extrapolation based on the same 8-page benchmark:
Pages CrawledLangShake TimeTraditional TimeBandwidth SavingsCPU Time Saved
100~20s~7.6 min~6MB vs 82MB~5.7 sec
1,000~3 min~76 min~62MB vs 820MB~57 sec
10,000~30 min~12.6 hrs~620MB vs 8.2GB~9.5 min
These numbers assume similar concurrency, network conditions, and content size per page.
Real-world performance may vary based on JS rendering and HTML complexity.
LangShake doesn't just reduce crawling pain, it builds a trust layer into your content:
  • ✅ JSON-LD with checksums → verifiable, tamper-resistant data
  • ✅ .llm.json with Merkle root → integrity across the site
  • ✅ Accurate content without parsing broken HTML
It's like robots.txt, but for AI agents. Want to make your site AI- and LLM-friendly, while saving bandwidth, compute, and headaches? You can get started right now using our open-source tools:
  1. LangshakeIt CLI
    Generate .well-known/llm.json and per-page JSON-LD with verifiable checksums.
  2. ShakeProof CLI
    Benchmark and validate your site against traditional crawling. Measure speed, trust and performance.
LangShake is not on npm yet but everything is available on GitHub.
Clone the repo, follow the quickstart, and make your site LangShake-ready today.
LangShake-It CLI
git clone https://github.com/langshake/langshake-it
cd langshake-it
npm install -g langshakeit
# cd to your site working directory
langshakeit init
npm run build
langshakeit --input out --out public/langshake --llm public/.well-known/llm.json
Shake-Proof CLI
git clone https://github.com/langshake/shake-proof
cd shake-proof
npm install
npm link
shakeproof --url https://yourdomain.com --json
Within minutes, you'll have:
  • Verifiable structured data
  • A global .llm.json index
  • Full benchmark reports comparing your current setup to LangShake
Make your content AI-optimized and machine-trusted.
Explore the LangShake project on GitHub