Skip to content

feat: added two google carousel parsers#392

Draft
Ed-Lovera wants to merge 1 commit into
serpapi:masterfrom
Ed-Lovera:feat/google/carousel-parser
Draft

feat: added two google carousel parsers#392
Ed-Lovera wants to merge 1 commit into
serpapi:masterfrom
Ed-Lovera:feat/google/carousel-parser

Conversation

@Ed-Lovera

Copy link
Copy Markdown

Overview

  • Implements Google search results carousel parsing.
  • Introduces the nokolexbor gem.
  • Introduces the rspec gem.
  • LCA Container Detection: The parser finds the Lowest Common Ancestor (LCA) container of all elements that match the carousel criteria (contain an image, text, and Google search path).

  • Dynamic Header Resolution: Starting at the LCA container, the parser walks up the DOM parent tree to find the nearest local section header (supporting standard h2/h3 tags, ARIA headings, and Google entity titles). The last word of the header is used as the dynamic JSON key (e.g., "artworks", "books", "albums").

  • Nokolexbor ::text Selector: Rather than manually filtering nested tags and children to find text nodes. This allows us to grab text values directly from any container tags (making parsing tag-agnostic).

  • Thumbnail Resolution: Google loads the first carousel images using deferred script blocks at the bottom of the page. The parser scans the <script> blocks to build an ID-to-Base64 map, resolves any escape sequences, and maps them to image elements (falling back to the data-src attribute if they are not script-loaded).


Why There Are Two Versions?

The parser was split into two versions because of minor conflicts in the code challenge instructions:

Version One (GoogleCarouselParserOne)

  • Purpose: Strictly satisfies the literal instructions in the first block of the README.md file.
  • Layout:
    • Omit the painting thumbnail ("image" key) completely.
    • Wrap the Google search "link" inside a string array (e.g. ["https://www.google.com/search..."]).
  • Test Suite: Verified via google_carousel_parser_one_spec.rb across all fixtures.

Version Two (GoogleCarouselParserTwo)

  • Purpose: Matches the exact schema and layout of expected array for Van Gogh paintings.
  • Layout:
    • Extracts the base64 painting thumbnail ("image" key).
    • Keeps the Google search "link" as a plain String.
  • Test Suite: Verified via google_carousel_parser_two_spec.rb (including exact structure and element equality checks against the expected JSON array).

@Ed-Lovera Ed-Lovera marked this pull request as draft June 23, 2026 06:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant