feat: added two google carousel parsers by Ed-Lovera · Pull Request #392 · serpapi/code-challenge

Ed-Lovera · 2026-06-23T06:41:42Z

LCA Container Detection: The parser finds the Lowest Common Ancestor (LCA) container of all elements that match the carousel criteria (contain an image, text, and Google search path).
Dynamic Header Resolution: Starting at the LCA container, the parser walks up the DOM parent tree to find the nearest local section header (supporting standard h2/h3 tags, ARIA headings, and Google entity titles). The last word of the header is used as the dynamic JSON key (e.g., "artworks", "books", "albums").
Nokolexbor ::text Selector: Rather than manually filtering nested tags and children to find text nodes. This allows us to grab text values directly from any container tags (making parsing tag-agnostic).
Thumbnail Resolution: Google loads the first carousel images using deferred script blocks at the bottom of the page. The parser scans the <script> blocks to build an ID-to-Base64 map, resolves any escape sequences, and maps them to image elements (falling back to the data-src attribute if they are not script-loaded).

The parser was split into two versions because of minor conflicts in the code challenge instructions:

Purpose: Strictly satisfies the literal instructions in the first block of the README.md file.
Layout:
- Omit the painting thumbnail ("image" key) completely.
- Wrap the Google search "link" inside a string array (e.g. ["https://www.google.com/search..."]).
Test Suite: Verified via google_carousel_parser_one_spec.rb across all fixtures.

Purpose: Matches the exact schema and layout of expected array for Van Gogh paintings.
Layout:
- Extracts the base64 painting thumbnail ("image" key).
- Keeps the Google search "link" as a plain String.
Test Suite: Verified via google_carousel_parser_two_spec.rb (including exact structure and element equality checks against the expected JSON array).

feat: added two google carousel parsers

0f79ea5

Ed-Lovera marked this pull request as draft June 23, 2026 06:44

Provide feedback