Skip to content

ScrapeGraphAI/just-scrape

Repository files navigation

just-scrape

Made with love by the ScrapeGraphAI team 💜

Demo Video

Command-line interface for ScrapeGraph AI — AI-powered web scraping, data extraction, search, crawling, and page-change monitoring.

v1.0.0 — SDK v2 migration. This release migrates the CLI to the scrapegraph-js v2 SDK. The v1 endpoints (smart-scraper, search-scraper, markdownify, sitemap, agentic-scraper, generate-schema) have been removed. Use scrape --format … for multi-format output, extract for structured data, and the new monitor command for page-change tracking.

Project Structure

just-scrape/
├── src/
│   ├── cli.ts
│   ├── lib/
│   │   ├── env.ts
│   │   ├── folders.ts
│   │   └── log.ts
│   ├── commands/
│   │   ├── scrape.ts
│   │   ├── extract.ts
│   │   ├── search.ts
│   │   ├── crawl.ts
│   │   ├── monitor.ts
│   │   ├── history.ts
│   │   ├── credits.ts
│   │   └── validate.ts
│   └── utils/
│       └── banner.ts
├── dist/
├── tests/
├── package.json
├── tsconfig.json
├── tsup.config.ts
├── biome.json
└── .gitignore

Installation

npm install -g just-scrape
pnpm add -g just-scrape
yarn global add just-scrape
bun add -g just-scrape
npx just-scrape --help
bunx just-scrape --help

Package: just-scrape on npm.

Coding Agent Skill

You can use just-scrape as a skill for AI coding agents via Vercel's skills.sh.

Or you can manually install it:

bunx skills add https://github.com/ScrapeGraphAI/just-scrape

Browse the skill: skills.sh/scrapegraphai/just-scrape/just-scrape

Configuration

The CLI needs a ScrapeGraph API key. Get one at https://scrapegraphai.com/dashboard.

Four ways to provide it:

  1. Environment variable: export SGAI_API_KEY="sgai-..."
  2. .env file: SGAI_API_KEY=sgai-...
  3. Config file: ~/.scrapegraphai/config.json
  4. Interactive prompt

Environment Variables

Variable Description Default
SGAI_API_KEY ScrapeGraph API key
SGAI_API_URL Override API base URL https://v2-api.scrapegraphai.com
SGAI_TIMEOUT Timeout (seconds) 120
SGAI_DEBUG Debug logs 0

JSON Mode (--json)

just-scrape credits --json | jq '.remaining'
just-scrape scrape https://example.com --json > result.json
just-scrape history scrape --json | jq '.[].id'

Scrape

Fetch a URL and return one or more formats: markdown, html, screenshot, branding, links, images, summary, or json (AI extraction). Default: markdown.

just-scrape scrape https://example.com
just-scrape scrape https://example.com -f markdown,links,images
just-scrape scrape https://example.com -f json -p "Extract all products"
just-scrape scrape https://app.example.com --mode js --stealth --scrolls 5

Extract

Extract structured JSON from a known URL with AI. A dedicated endpoint optimized for extraction; equivalent to scrape -f json but tuned for that path.

just-scrape extract https://store.example.com -p "Extract product names and prices"
just-scrape extract https://news.example.com -p "Get headlines and dates" \
  --schema '{"type":"object","properties":{"articles":{"type":"array"}}}'
just-scrape extract https://app.example.com -p "Extract user stats" \
  --cookies '{"session":"abc123"}' --stealth

Search

Search the web and optionally extract structured data from the results.

just-scrape search "Best Python web frameworks in 2026" --num-results 10
just-scrape search "Top 5 cloud providers pricing" \
  -p "Extract provider name and free-tier details"
just-scrape search "AI regulation EU" --time-range past_week --country eu

Crawl

Crawl multiple pages from a starting URL. Returns a job that's polled until completion.

just-scrape crawl https://docs.example.com --max-pages 50 --max-depth 3
just-scrape crawl https://example.com \
  --include-patterns '["^https://example\\.com/blog/.*"]' \
  --exclude-patterns '[".*\\.pdf$"]'
just-scrape crawl https://example.com -f markdown,links,images --max-pages 20

Monitor

Schedule a page to be re-scraped on a cron interval and (optionally) post diffs to a webhook. Actions: create, list, get, update, pause, resume, delete, activity.

just-scrape monitor create \
  --url https://store.example.com/pricing \
  --interval 1h \
  --webhook-url https://hooks.example.com/pricing
just-scrape monitor list
just-scrape monitor activity --id mon_abc123 --limit 50
just-scrape monitor pause --id mon_abc123

--interval accepts a cron expression (0 * * * *) or shorthand (1h, 30m, 1d).

History

Browse past requests. Interactive by default (arrow keys); pass an ID to view a specific request. Services: scrape, extract, search, crawl, monitor.

just-scrape history                                    # all services, interactive
just-scrape history extract
just-scrape history scrape req_abc123 --json
just-scrape history crawl --json --page-size 100 | jq '.[] | {id, status}'

Credits

Check your remaining credit balance.

just-scrape credits
just-scrape credits --json | jq '.remaining'

Validate

Health-check the API and validate your key.

just-scrape validate

Contributing

git clone https://github.com/ScrapeGraphAI/just-scrape
cd just-scrape
bun install
bun run dev --help

Made with love by the ScrapeGraphAI team 💜

About

CLI for AI-powered web scraping, structured data extraction, web search, multi-page crawling, and scheduled page-change monitoring powered by the ScrapeGraph AI API. Turn any URL into markdown, HTML, screenshots, links, images, summaries, or brand assets; pull structured JSON with a prompt + schema; watch pages on a cron with webhook alert

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors