The Bronze Tier is Far From Gold: Avoiding Toxic Assets (And Other Medallion Architecture Lies)
LET’S SAY you’re a hot new drone company. Doing, I don’t know, crowd counting at festivals.
One 4K camera generates roughly 45 gigabytes of footage per hour. That’s your baseline. Add additional sensors (battery, motor, LiDAR for terrain mapping, thermal imaging, and flight telemetry, whatever). Call it 60 gigabytes per hour for a properly equipped commercial drone.
Manageable for one flight. But you aren’t a hobbyist. You’re running a fleet.
You have 100 drones flying 8-hour shifts (these festivals go on FOREVER).
100 drones × 60 GB/hour = 48 terabytes daily.
That is 17.5 petabytes annually.
This is the scale nobody thinks about when a solutions architect waves a hand and says, “Just dump it in the Bronze tier and figure it out downstream.”
“BUT WAIT,” you say. “I’M NOT DOING DRONES.”
Fine. The math comes for you anyway:
Ad Tech: Real-time bidding processes millions of requests per second. By the time data hits Bronze, the auction closed 800ms ago. You’re building dashboards on ghosts.
Observability: A mid-sized Kubernetes cluster generates terabytes of logs daily. If you strip trace context at ingest to save bandwidth, you saved $50 on storage to lose $500k in downtime.
Connected Vehicles: Autonomous vehicles can generate up to 5TB per hour. You store it indefinitely because nobody built the systems to know what is safe to delete.
The numbers change. The pattern doesn’t. You are moving data that loses value in transit, storing context-free noise, and creating technical debt with every gigabyte.
The “Hadoop Hangover”
We are suffering from a collective hangover. Ten years ago, storage became cheap enough that we stopped curating data. The philosophy shifted to ELT (Extract, Load, Transform). Just load it all. We’ll find the value later.
That worked for CRM records. It collapses under high-frequency sensor data.
The result? Downstream engineers inherit garbage. Data scientists file tickets instead of building models. The “Bronze Tier” (the foundation supposed to enable everything) becomes a landfill with a fancy name.
This isn’t primarily a cost problem. It’s a control problem.
Every new data source makes the system more fragile. Every request for “clean data” triggers a three-week cleanup project. This is the definition of a Data Swamp.
The 68% You Are Carrying (And Paying For)
Here is the uncomfortable truth about cloud bills: Organizations are paying a premium to store digital trash.
According to Seagate’s “Rethink Data” report, only 32% of enterprise data ever gets used.
The remaining 68% sits in storage, consuming resources and providing zero value.
For IoT, it’s worse. Up to 90% of sensor data is never analyzed. It is the steady hum of a machine working normally.
Think about what this means for engineering teams.
Every schema change. Every pipeline update. Every miscategorized file. All of them contain, or affect, data that has a 68% to 90% probability of never mattering.
And teams can’t fix it. Because the Bronze tier captured raw data without context, nobody knows which 68% is the waste.
Fear drives architecture. Organizations keep it all. The entropy grows.
The $200,000 Janitor
Companies compete with Meta and Google for talent, paying top-tier salaries to hire data engineers.
Then they hand them a mop.
According to Anaconda’s State of Data Science report, data professionals spend roughly 40% of their time on data preparation and cleaning.
This is skilled talent spending half their week on work that adds no analytical value. They aren’t answering business questions. They are digitally scrubbing floors because someone upstream didn’t wipe their feet.
Fifty-seven percent of data engineers say this is the least enjoyable part of their job.
When your best engineer quits, don’t look at the comp package. Look at the architecture. They left because they got tired of fixing JSON parsing errors in the Bronze layer.
Context Doesn’t Survive the Trip
There is a fundamental misunderstanding about “Raw” data.
Executives believe raw data is like gold ore, valuable regardless of where you process it.
Data isn’t ore. It’s concrete. It is useful for a short window. Once it sets, it’s a rock.
A sensor reading’s value depends entirely on context.
What was the equipment doing? What were the environmental conditions? What firmware version was running?
That context exists at the moment of creation. Move raw data to a central repository without capturing it, and you strip the meaning. You can stare at a temperature spike in your Silver tier three days later, but you can’t know if it was a critical failure or a routine calibration.
Google researchers call this “Data Cascades”: upstream quality issues that compound into massive downstream failures in AI models.
Speaking of AI: You cannot build high-performing models on a low-quality foundation.
If you point an LLM at a Bronze tier full of unvalidated, context-free data, you don’t get magic. You get hallucinations. You get a model that confidently tells you to shut down a healthy machine because it trained on unlabeled maintenance data.
The “dump it in Bronze” approach effectively disqualifies organizations from the AI race.
Compliance is an Architecture Problem
GDPR fines can reach 4% of global annual revenue.
When you aggregate data indiscriminately, you are moving PII across jurisdictional boundaries. The “figure it out downstream” mentality assumes compliance is a transformation problem.
It is not.
Once PII enters your Bronze tier without controls, you have created an immutable audit trail of how you mishandled user data.
“Right to be Forgotten”? Good luck finding every instance of a user in a petabyte-scale lake where schemas drift daily.
“Data Sovereignty”? You just replicated German user data to a US-East bucket because your ingestion script was lazy.
Organizations are building a liability engine.
The Solution: The Assay Office Model
Historical gold mines maintained Assay Offices on site. They tested samples before shipping. Sending low-grade ore to distant refineries made no economic sense.
Data infrastructure needs an Assay Office: Processing capability at the point of creation.
The industry needs to shift validation, enrichment, and reduction to the Left, ideally to the Edge. (And let’s be honest, anything that involves an ethernet cable might as well be called “edge” for all your data warehouse team cares.)
Here is what Upstream Processing gives you:
Schema Validation at the Gate: If a sensor sends malformed JSON, reject it before it enters Bronze. Catching problems at the source prevents the multiplication of bad data.
Context Capture: Attach metadata and operational state when they exist naturally. Downstream teams get self-describing data.
Data Reduction: If you monitor a sensor at 1,000Hz but the value changes every 10 seconds, intelligent ingestion allows you to record only the variance. You reduce storage costs by 99% without losing insight.
Compliance Firewalling: Detect PII at the source. Hash it or drop it before it leaves the secure zone. No PII in Bronze means no audit findings.
The Question That Matters
Look at your box labeled “Raw Zone.” Is it an asset or a liability?
If engineers spend more time fixing pipelines than building them, it is a liability.
If every upstream change triggers downstream breakage, it is a liability.
If you are storing petabytes but cannot confidently delete anything, it is a liability.
The fix isn’t abandoning the Medallion model. It is recognizing that the hard work happens before Bronze.
Teams that do this work upstream move faster. They spend 80% less time on data firefighting.
The teams that don’t? They keep paying the Bronze Tax.
Stop building a landfill. Start building an Assay Office.
Ready to clean up your Bronze tier? Check out Expanso’s packages for getting started with intelligent data pipelines that process where data is created, not after it’s become someone else’s problem. See examples in action or dive into the technical documentation.
o&7?FSK!
