Environment
cyclonedx-python-lib 9.1.0 (the offending code path is unchanged on main / 11.x)
- Python 3.10, Linux
Description
Serializing a BOM with many components scales quadratically with the number of components, not linearly. For a container SBOM with several thousand components, output_as_string() stalls for minutes, almost entirely inside Bom.validate().
Steps to reproduce
import time
from cyclonedx.model.bom import Bom
from cyclonedx.model.component import Component, ComponentType
from cyclonedx.output import make_outputter
from cyclonedx.schema import OutputFormat, SchemaVersion
for N in (1000, 2000, 4000, 8000):
bom = Bom()
bom.metadata.component = Component(name="root", type=ComponentType.CONTAINER, bom_ref="root")
for i in range(N):
bom.components.add(Component(name=f"c{i}", version="1.0",
type=ComponentType.LIBRARY, bom_ref=f"ref-{i}"))
out = make_outputter(bom=bom, output_format=OutputFormat.JSON,
schema_version=SchemaVersion.V1_6)
t = time.perf_counter(); out.output_as_string()
print(f"N={N}: {time.perf_counter()-t:.2f}s")
Expected vs actual
Expected: time grows roughly linearly with N.
Actual: time grows roughly 4x per 2x N (quadratic):
N=1000: 0.14s
N=2000: 0.48s (x3.4)
N=4000: 1.75s (x3.6)
N=8000: 6.70s (x3.8)
A cProfile run at N=15000 shows ~112M calls to the lambda at cyclonedx/model/bom.py:653.
Root cause
cyclonedx/output/json.py — generate() unconditionally calls bom.validate() during serialization (there is no opt-out).
cyclonedx/model/bom.py — Bom.validate() calls self.register_dependency(target=...) once per component (and per service).
cyclonedx/model/bom.py — register_dependency() locates the existing entry with a linear scan:
_d = next(filter(lambda _d: _d.ref == target.bom_ref, self.dependencies), None)
Called once per component over the growing dependency collection, this is O(N²) overall.
Proposed fix
Replace the linear lookup with an indexed (dict[ref -> Dependency]) lookup. In a local benchmark an indexed variant produces byte-identical output and gives ~8x speedup at N=6000 (the gap widens with N). I'm happy to open a PR with the fix and a regression/scaling test.
Environment
cyclonedx-python-lib9.1.0 (the offending code path is unchanged onmain/ 11.x)Description
Serializing a BOM with many components scales quadratically with the number of components, not linearly. For a container SBOM with several thousand components,
output_as_string()stalls for minutes, almost entirely insideBom.validate().Steps to reproduce
Expected vs actual
Expected: time grows roughly linearly with N.
Actual: time grows roughly 4x per 2x N (quadratic):
A
cProfilerun at N=15000 shows ~112M calls to the lambda atcyclonedx/model/bom.py:653.Root cause
cyclonedx/output/json.py—generate()unconditionally callsbom.validate()during serialization (there is no opt-out).cyclonedx/model/bom.py—Bom.validate()callsself.register_dependency(target=...)once per component (and per service).cyclonedx/model/bom.py—register_dependency()locates the existing entry with a linear scan:Proposed fix
Replace the linear lookup with an indexed (
dict[ref -> Dependency]) lookup. In a local benchmark an indexed variant produces byte-identical output and gives ~8x speedup at N=6000 (the gap widens with N). I'm happy to open a PR with the fix and a regression/scaling test.