Skip to content

gsoc26: Mapping Conversion Layer (Layer 3) #60

@DhanashreePetare

Description

@DhanashreePetare

Feature Request

Describe the feature you'd like
Add Layer 3 (cross-class mapping conversion) to the Databus Python Client download pipeline. Users should be able to convert downloaded RDF files to CSV and vice versa, and convert between RDF triple and quad formats, using the existing --convert-format flag extended with --graph-name and --base-uri flags.

Why is this feature important?
Many applications consume tabular data (CSV/TSV) but datasets on the Databus are published in RDF formats. Cross-class conversion eliminates the manual mapping step and makes Databus datasets directly consumable by tabular applications. It also enables named graph workflows by supporting triples-to-quads promotion and quads-to-triples splitting.

Describe alternatives you've considered
The Java client uses TARQL for CSV to RDF mapping. For the Python client, a direct rdflib-based approach is used instead to avoid adding a new dependency. The companion metadata JSON file approach is used to address data loss of RDF datatypes and language tags during RDF to CSV conversion, enabling lossless round trips when both files are used together.

Additional context
Supported mapping directions:

From To Notes
RDF Triples RDF Quads Requires --graph-name. Lossless.
RDF Quads RDF Triples Splits into one file per named graph. Lossless.
RDF Triples CSV Quasi-equal. Companion .meta.json generated.
CSV RDF Triples Lossless if companion file present. Requires --base-uri.

Out of scope for this issue:

  • RDF Quads to CSV (stretch goal, significant added complexity)
  • TARQL integration

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions