Skip to content

fix(chromium): fix 3 P0 bugs and 2 P1 issues in chromium loader + cleanups#1088

Open
ghshhf wants to merge 4 commits into
ScrapeGraphAI:mainfrom
ghshhf:fix/chromium-all-bugs
Open

fix(chromium): fix 3 P0 bugs and 2 P1 issues in chromium loader + cleanups#1088
ghshhf wants to merge 4 commits into
ScrapeGraphAI:mainfrom
ghshhf:fix/chromium-all-bugs

Conversation

@ghshhf

@ghshhf ghshhf commented Jun 16, 2026

Copy link
Copy Markdown

Summary

This PR fixes 3 critical (P0) bugs and 2 high-priority (P1) issues in the chromium loader.

P0 Bugs Fixed

P0-1: timeout=0 guard short-circuit causes 6h CI hang

File: chromium.py:130
Root cause: if timeout and timeout <= 0 - When timeout=0, 0 and ... short-circuits to falsy, so the ValueError is never raised. Execution falls through to real network calls, causing the test suite to hang until the 6h CI kill limit.
Fix: Changed to if timeout is not None and timeout <= 0.

P0-2: finally: await browser.close() may raise UnboundLocalError

File: chromium.py:188
Root cause: If browser is None (invalid browser_name) or never assigned (async_playwright fails), the finally block crashes.
Fix: Added if browser is not None guard.

P0-3: finally: driver.quit() may reference unbound variable

File: chromium.py:101
Root cause: If driver initialization fails, driver is never assigned.
Fix: Added scope check before driver.quit().

Additional Fixes

  • lazy_load() selenium dispatch: Extract _get_scraping_fn() that properly routes to ascrape_undetected_chromedriver
  • Redundant kwargs.get(): Remove redundant kwargs reads for already-explicit params
  • Typo fix: scraperaphai -> scrapegraphai in pyproject.toml
  • Empty test files: Fill test stubs for JsonScraperMultiGraph and SmartScraperMultiConcatGraph

ghshhf added 4 commits June 16, 2026 20:45
- P0-1: Fix timeout=0 guard short-circuit (if timeout is not None)
- P0-2: Guard finally: await browser.close() against None
- P0-3: Guard finally: driver.quit() against unbound variable
- P1-2: Fix _get_scraping_fn() for selenium backend dispatch
- P1-4: Remove redundant kwargs.get() for direct params
@dosubot dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. bug Something isn't working tests Improvements or additions to test labels Jun 16, 2026
@VinciGit00

Copy link
Copy Markdown
Member

Thanks for the PR — the core chromium.py fixes are solid (the selenium dispatch in lazy_load, the unbound-variable crashes in the finally blocks, and the timeout=0 guard all address real bugs). But the added tests need fixing before this can be merged:

Blocking — tests/test_json_scraper_multi_graph.py will fail at import. It does from scrapegraphai.graphs import JsonScraperMultiGraph, but the exported class is JSONScraperMultiGraph (uppercase JSON, see scrapegraphai/graphs/__init__.py). As written this raises ImportError and turns CI red. Please update the import and the references to use JSONScraperMultiGraph.

Tests are too loose. Both test_empty_config_raises_error and test_empty_sources_raises_error use pytest.raises(Exception), which will pass even if the failure comes from something unrelated (e.g. LLM/model init) rather than the validation you intend to test. Please assert on the specific exception type (and ideally match the message) so the tests actually verify the intended behavior.

A couple of smaller, non-blocking notes on the source fixes:

  • if "driver" in dir(): works but is fragile — prefer initializing driver = None before the try and checking if driver is not None.
  • For ascrape_playwright_scroll, browser = None is set inside the async with async_playwright() as p: block, so if entering that context manager fails, browser is still unbound and if browser is not None in the finally would itself raise UnboundLocalError. Moving the browser = None initialization above the try would fully cover that case.

Also consider splitting the unrelated changes (the pyproject.toml typo fix and the two new test files) into separate PRs to keep this one focused on the chromium loader.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working size:M This PR changes 30-99 lines, ignoring generated files. tests Improvements or additions to test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants