A major power utility had tens of thousands of sensors. After 7 days, the data got deleted - not because they wanted to, but because storing petabytes at cloud rates would bankrupt the department. Engineers learned to diagnose problems fast or not at all.
Why cloud-centric architectures fail for critical infrastructure
Engineers needed sensor data from 8 days ago. IT had to explain it was already gone. The retention limit wasn't policy - it was budget reality. Cloud storage for petabytes would cost more than the entire IT budget.
High-frequency sensor data at thousands of samples per second from every substation. Shipping it all to the cloud costs a fortune, and most of it is noise they don't need anyway.
Critical infrastructure can't expose operational data over the public internet. Compliance requires air-gapped operations. PII from customer meters must be scrubbed before any data leaves the substation.
Each substation runs indexing and analysis. Raw sensor data stays on local storage. Only metadata and anomalies go to the cloud.
Each substation cluster runs ML models that index sensor events - voltage anomalies, frequency deviations, equipment signatures, timestamps. Instead of storing petabytes of raw waveforms, store gigabytes of metadata and flagged events.
Raw sensor data stays on local NAS at each substation. Cost: $0.003/GB/month vs $0.023 for cloud. Go from 7-day retention to 5 years for less money. Engineers search the index, retrieve only what they need.
Customer meter data gets scanned for PII before leaving the substation. Automated redaction ensures compliance. Only anonymized, aggregated data flows upstream.
Every data transformation, every anomaly detection, every control decision gets cryptographically signed. Full audit trail for regulatory compliance, all air-gapped.
Lightweight compute nodes at each substation cluster. Ingest sensor streams from nearby transformers, switches, and meters. Process locally before data moves.
Run time-series analysis, anomaly detection, and event classification. Extract features, timestamps, and signatures. A petabyte problem becomes a gigabyte problem.
Raw waveforms stay on cheap local storage at the substation. Engineers search the index, retrieve only the specific time windows they need. No bandwidth costs.
Cryptographically sign all transformations and decisions. Scan for PII, redact automatically. Only clean, signed, anonymized data flows to central systems.
If your retention limits are budget-driven, not policy-driven, we should talk. We've deployed at utilities, cities, and critical infrastructure.