How I Rescued My Cloud Bill From the Danger Zone (A Data Architecture Story)

Let me tell you about a cloud bill that was quietly becoming a problem.

It started innocently enough. A health-tech app, a handful of users, sensor data flowing in from wearables. Heart rate, HRV, RR intervals: the kind of data that tells you whether someone is thriving or about to crash. We needed to store it, query it fast, and not go broke doing it.

We picked AWS Timestream. It sounded perfect at the time: managed, serverless, purpose-built for time-series data. For the first few months? It was fine.

Then the data grew. The queries got more complex. One day I looked at the bill and thought: this is not fine.

The Problem With “Managed” Services Nobody Warns You About

Timestream charges by the amount of data you scan per query. Not what you store, what you touch. Sounds reasonable until you’re running analytics across 63 million rows spanning three years, and every query is basically a full table scan because the data isn’t indexed the way you’d hope.

I’m not here to roast Timestream. It’s genuinely good for certain workloads. But for a system that needs to answer “give me this user’s heart rate data for the last 90 days” a thousand times a day? The cost-per-query model stings fast.

Beyond cost: the old system was completely separate from our new platform. Two databases, two AWS accounts, two mental models. Every time someone on the team needed historical data, it was a whole thing.

Something had to change.

The Plan: Think Cheap, Think Long

Before writing a single line of code, I spent a week just thinking about the data. What does it look like? Who accesses it? How often? What’s the access pattern five years from now?

Here’s what I worked out:

Not all data is equal. The last 90 days of a user’s health data gets hit constantly: dashboards, trend analysis, alerts. Data from 2022? Almost never. But you still need to keep it. Healthcare data isn’t something you casually throw away.

This led to a pretty clean split:

Hot tier: Last 90 days, TimescaleDB (fast, indexed, queryable)
Cold tier: Everything older, GCS (cheap, durable, queryable when needed)

Simple idea. Surprisingly effective.

Designing the Archive (The Part I Actually Enjoyed)

GCS is just object storage, so the “schema” is really just how you organize files. Get it wrong and you’ll regret it forever. Get it right and future-you will send you a thank-you note.

I went with a Hive-style partition structure:

gs://my-health-archive/
  raw/
    measure_type=heart/
      year=2023/month=01/day=15/
        part-00002.parquet
    measure_type=hrv/
      year=2023/month=01/day=15/
        part-00002.parquet

A few deliberate choices here:

Measure type comes first. The most common access pattern is “give me all heart data for this date range”, not “give me all data for this one day.” Putting measure type at the top means you can skip entire subtrees instantly.

Date partitioning for range queries. Want January 2024? List year=2024/month=01/. No full scan. BigQuery and other query engines can do predicate pushdown on this structure natively.

One file per user per partition. One massive file per day means you scan the whole thing to find one user. One file per user per day creates millions of tiny files and kills GCS list performance. I landed on partitioning by date and splitting by user within each partition.

Parquet + Snappy compression. Raw JSON from Timestream was enormous. Parquet with Snappy gave roughly 6-8x compression. The 63 million rows that would’ve been tens of gigabytes of JSON became something very manageable.

The Lifecycle Policy That Pays For Itself

GCS has lifecycle policies that automatically move objects to cheaper storage classes over time.

Standard storage costs $0.020/GB/month. Coldline is $0.004. Archive is $0.0012. You barely notice it at small scale. At hundreds of gigabytes of health data, the difference is real.

I set up automatic transitions:

Age	Storage Class	Cost
0–90 days	Standard	$0.020/GB/mo
90 days – 1 year	Nearline	$0.010/GB/mo
1–7 years	Coldline	$0.004/GB/mo
7+ years	Archive	$0.0012/GB/mo

The data moves itself. No cron jobs. No manual intervention. Just a JSON config applied to the bucket once via Terraform.

Estimated cost for the full archive: ~$15/year in Year 1, dropping to ~$5/year by Year 3. For a permanent record of all historical health data. That’s a rounding error.

TimescaleDB for the Hot Path

For the stuff that actually needs to be fast (the last 90 days), I moved to TimescaleDB. It’s PostgreSQL with a time-series extension: hypertables, automatic partitioning by time, native compression for chunks older than a week. Since it’s Postgres under the hood, you get all the indexing, JSONB, and query power you already know.

We store metrics as JSONB wide-rows:

-- One row per heart measurement
INSERT INTO metrics (time, user_id, device, data)
VALUES (
  '2026-05-15T10:30:00Z',
  42,
  'polar-h10',
  '{"hr": 72, "bbi1": 832, "bbi2": 845, "bbi3": 819}'
)
ON CONFLICT (time, user_id, device)
DO UPDATE SET data = metrics.data || EXCLUDED.data;

One row. Six fields from the original sensor. JSONB merge on conflict so re-runs are safe. This replaced what used to be a Timestream record with five separate scalar writes, and made the data usable in SQL queries without contortions.

For HRV data (pre-computed rmssd, stress index, sdnn), I wrote directly to a derived_metrics table rather than re-triggering the computation worker. The original BBI samples are unrecoverable from derived values. Re-derive and you get garbage. So I skipped the worker entirely.

Budget Guardrails (Because Cloud Costs Are a Horror Movie)

One thing I do on every GCP project now: hard budget limits in Terraform.

resource "google_billing_budget" "seren_budget" {
  budget_filter {
    projects = ["projects/${var.project_id}"]
  }
  amount {
    specified_amount { units = tostring(var.budget_hard_kill_usd) }
  }
  # 50% warning
  # 80% warning  
  # 100%: disable billing (shuts everything down)
}

An alert at 50%, a warning at 80%, and a hard kill at 100% that disables billing on the entire project. Yes, that shuts everything down. Yes, that’s intentional. I would rather explain a 30-minute outage to myself than explain a $10,000 bill to anyone.

The budget lives in Terraform. It’s version-controlled. It can’t be accidentally clicked away in the console.

Was It Worth It?

Yes. And not just for the cost savings.

The new architecture is easier to reason about. Hot data is in a real database you can query with SQL. Cold data is in object storage organized by date and measure type, readable by anything that speaks Parquet: BigQuery, DuckDB, pandas, Spark, pick one. The migration is restartable and idempotent. The infrastructure is codified.

The old system was a black box I had to log into AWS to interrogate. The new one I can describe to a new team member in five minutes.

If you’re running a time-series workload on a managed cloud database and the bill is creeping up, stop. Think about what data is actually hot. Everything else can live somewhere cheaper, organized well enough that you can still reach it when you need it.

If you’ve done something similar, or have opinions about Parquet partition strategies, I’d genuinely love to hear about it.