Skip to content
AlgoCoder
Report № DA-02Data EngineeringSub-pattern · Schema validation02 / 12

Schema Validation Layer for a Platform With Recurring Production Breaks

Source systems were changing schemas without telling the data team. The downstream pipeline broke once a week.

§ Client

A data platform consuming feeds from multiple internal source systems owned by different engineering teams. The source systems were under active development by their respective teams; schema changes happened on those teams' release cadences without coordination with the data team that consumed the feeds.

§ Problem

Production breaks were recurring on a weekly cadence. A source team would change a column type, add a required field, rename something, or drop a field that had been deprecated upstream but was still in active use downstream. The data pipeline would fail in the next run, the data team would diagnose the failure, contact the source team, and patch the pipeline. The cycle was predictable and consumed engineering capacity that the data team needed for forward work.

§ Engagement

A schema validation layer at every source system boundary, paired with a contract testing process the source teams could participate in.

Each source feed was assigned an explicit schema definition — column names, types, nullability, constraints. The schema was version-controlled and lived in a repository the data team and the source teams both had visibility into. Inbound data was validated against the schema at the ingestion boundary. Records or batches that violated the schema were quarantined rather than propagating downstream.

Where source teams were willing to participate, contract tests were added to their release pipelines. A source system that intended to change a schema ran the contract test against the data team's expected schema as part of its own release process; schema changes that would break downstream were surfaced before deployment rather than after.

For source teams who couldn't or wouldn't participate, the validation layer caught the change at runtime and surfaced it to the data team immediately rather than letting the pipeline fail downstream. The diagnostic surface — what changed, when, in which feed — became immediate rather than requiring archeology.

A schema registry was set up to make the current expected schemas discoverable across the organization. Source teams considering a change could see what consumers depended on the field they were thinking of removing.

§ Outcome

Pipeline breaks attributable to source schema drift became rare. The breaks that did occur were caught at the ingestion boundary rather than propagating into reports. The data team's engineering capacity recovered the time previously consumed by reactive schema-break remediation. The relationship between the data team and the source teams improved as the coordination surface became explicit rather than implicit.

End of Report
DA-02 · Data
End of Transmission

Building something with shape similar to this?

Talk to a Data Architect →