Database Optimization: Reduce Size by 99% - 2025 Guide

The 1.5GB Problem That Was Killing Our Performance

Let me tell you about a problem that was keeping me up at night. We had this database—nothing fancy, just a PostgreSQL instance powering a moderately successful SaaS application. But it kept growing. And growing. Before we knew it, we were staring at a 1.5GB database for what should have been a simple service.

The symptoms were classic: slow queries, sluggish API responses, ballooning storage costs. Our users were starting to notice. The worst part? We knew most of that data was redundant. We were storing JSON blobs—lots of them—and each one contained massive amounts of repeated information.

Here's what we discovered: we weren't just storing data inefficiently. We were actively working against our own performance goals. Every API call was slower than it needed to be. Every backup took forever. And our cloud bill? Let's just say it was getting uncomfortable.

But here's the good news: we fixed it. We reduced that 1.5GB database to just 15MB—a 99% reduction. And the techniques we used weren't magic. They're approaches you can implement right now in your own systems.

Understanding the JSON Blob Problem

Our core issue was JSON storage. We were using PostgreSQL's JSONB columns (which are generally great, by the way), but we'd fallen into a common trap. Each record contained complete, self-contained JSON objects with massive amounts of repeated data.

Think about it like this: imagine storing customer information. Instead of having a normalized customers table with addresses, we were storing the complete customer object—name, address, contact info, preferences—in every single transaction record. Multiply that by thousands of transactions per customer, and you've got serious duplication.

The JSONB format itself wasn't the villain here. PostgreSQL actually compresses JSONB pretty well. The problem was our data model. We were treating the database like a document store without considering the relational aspects that could save us space.

What made this particularly painful was how it affected our API. Every endpoint had to parse these massive JSON objects, extract what was needed, and send it along. The overhead was enormous. And because the data was duplicated everywhere, updating a customer's address meant updating thousands of records.

The Normalization vs. Denormalization Dance

This is where things get interesting. The programming community often debates normalization versus denormalization, and our case was a perfect example of why context matters.

We started by asking a simple question: what data changes independently? Customer information changes independently of transactions. Product details change independently of orders. Once we identified these independent entities, we could separate them.

But here's the twist: we didn't go full normalization. That's important. Complete normalization would have meant dozens of tables and complex joins that could hurt read performance. Instead, we took a hybrid approach.

We created reference tables for truly static data—things like product categories, country codes, currency types. These tables are small, rarely change, and can be cached entirely in memory. For frequently changing but shared data (like customer profiles), we normalized but added strategic denormalization for performance-critical paths.

The key insight? Not all duplication is bad. Some duplication is strategic caching. The problem was we had accidental, uncontrolled duplication. We replaced that with intentional, managed duplication where it made sense.

JSON Compression Techniques That Actually Work

code, coding, computer, data, developing, development, ethernet, html, programmer, programming, screen, software, technology, work, code, code

Even after normalization, we still had JSON data to store. Some information naturally fits JSON—configuration objects, variable attributes, API responses. For these cases, we implemented several compression strategies.

First, we stopped storing pretty-printed JSON. This seems obvious, but you'd be surprised how many development databases contain human-readable JSON with spaces, newlines, and tabs. Removing whitespace alone gave us a 20-30% reduction in some tables.

Next, we implemented key shortening. Instead of storing keys like "customer_billing_address_line_1", we stored "cba1". We maintained a lookup table (in memory, of course) to translate between human-readable keys and shortened keys. This approach is similar to how protocol buffers work, and it's incredibly effective for repetitive JSON structures.

We also started using PostgreSQL's TOAST (The Oversized-Attribute Storage Technique) more effectively. By default, PostgreSQL tries to keep data in-line, but for large values, it moves them to TOAST tables. We adjusted our storage parameters to be more aggressive about using TOAST for JSON columns over 500 bytes.

But here's my favorite technique: we started compressing JSON at the application level before it even hit the database. For data that was written once and read many times (like configuration templates), we'd gzip it in our application code and store it as binary data. The CPU cost was minimal compared to the storage savings.

API Integration: The Hidden Optimization Layer

This is where our database optimization story intersects with API design. You see, our bloated database wasn't just a storage problem—it was an API problem. And fixing one required fixing the other.

We redesigned our API to follow a key principle: fetch only what you need. Instead of endpoints returning complete objects with all nested data, we implemented sparse fieldsets. Clients could specify exactly which fields they wanted, and our API would fetch only those.

This had a cascading effect on our database design. Since we knew certain fields were always fetched together, we could store them together. Other fields that were rarely accessed could be moved to separate tables or even external storage.

We also implemented proper pagination. This seems basic, but our old API would sometimes return thousands of records in a single response. Now, we never return more than 100 records at once, and we use cursor-based pagination for consistency.

But the real game-changer was changing our mindset about API responses. We stopped thinking of them as direct database dumps and started thinking of them as curated data presentations. Sometimes, this meant creating materialized views specifically for common API queries. Other times, it meant pre-computing expensive aggregations.

Practical Steps You Can Take Today

Okay, enough theory. Let's talk about what you can actually do. If you're facing database bloat, here's where I'd start.

First, analyze your data. Use PostgreSQL's built-in tools to see what's taking up space. The pg_total_relation_size function is your friend. Look for tables with unexpected growth. Check for unused indexes—they can account for surprising amounts of space.

Second, identify duplication. Write queries that look for repeated values in JSON fields. You might be shocked at what you find. In our case, we discovered that 60% of our JSON data was repeated across records.

Third, consider your access patterns. What data is read together? What's written together? This will tell you what to normalize and what to keep together. Tools like pg_stat_statements can help you understand query patterns.

Fourth, implement compression gradually. Start with the low-hanging fruit: remove whitespace from JSON, enable PostgreSQL compression where appropriate, and consider application-level compression for large, static blobs.

Common Mistakes and How to Avoid Them

technology, computer, code, javascript, developer, programming, programmer, jquery, css, html, website, technology, technology, computer, code, code

I've seen teams make the same mistakes over and over. Let me save you some pain.

Mistake #1: Over-normalizing. Yes, normalization reduces duplication, but it can also kill performance with excessive joins. Find the sweet spot. In my experience, third normal form is often sufficient, and going beyond that rarely pays off.

Mistake #2: Ignoring indexes. When you normalize, you need foreign keys. And foreign keys need indexes. Forgetting to index foreign key columns is a classic performance killer.

Mistake #3: Premature optimization. Don't compress everything. Don't normalize everything. Measure first, then optimize. Use PostgreSQL's explain analyze to understand query performance before and after changes.

Mistake #4: Forgetting about transactions. When you split data across tables, you need to think about atomicity. Use database transactions to ensure consistency. This is especially important when denormalizing for performance—you need to keep the duplicated data in sync.

Mistake #5: Not monitoring. Optimization isn't a one-time task. Set up monitoring to track database growth over time. We use a simple script that emails us if any table grows by more than 10% in a week.

The Tools That Made It Possible

We didn't do this with magic. We used specific tools and approaches that anyone can adopt.

For database analysis, pg_stat_statements was invaluable. It showed us which queries were running slowly and why. For space analysis, we used a combination of pg_total_relation_size and custom queries to understand our data distribution.

For JSON manipulation, we used PostgreSQL's built-in JSON functions extensively. The jsonb_set, jsonb_extract_path, and jsonb_strip_nulls functions became our best friends.

For API testing, we used Postman collections to verify that our optimized database still delivered correct responses. We also implemented comprehensive integration tests to catch any data consistency issues.

And here's a pro tip: when dealing with large-scale data migration, do it in batches. We wrote migration scripts that processed records in batches of 1000, with pauses between batches to avoid overwhelming the database. It took longer, but it kept our application running smoothly throughout the process.

Beyond Storage: The Performance Benefits

The storage savings were dramatic—99% reduction is nothing to sneeze at. But the performance improvements were even more impressive.

Our API response times dropped by 70% on average. Some endpoints that used to take 2-3 seconds now respond in under 500 milliseconds. Database backup times went from hours to minutes. And our cloud storage costs? They're about 10% of what they used to be.

But here's what surprised me: developer productivity improved too. With a smaller, cleaner database schema, new team members could understand our data model faster. Queries were simpler. Debugging was easier. The cognitive load of working with our system decreased significantly.

We also found that our application became more resilient. With less data to process, database locks became shorter and less contentious. Replication lag decreased. Even our monitoring became more effective—with less noise in the system, real problems were easier to spot.

Your Action Plan for 2025

So where should you start? If I were you, I'd begin with these three steps.

First, measure your current state. How big is your database? What's growing fastest? What queries are slowest? You can't optimize what you don't measure.

Second, pick one table to optimize. Don't try to fix everything at once. Choose the table that's causing the most pain or taking the most space. Apply the techniques we discussed: look for duplication, consider normalization, implement compression.

Third, monitor the results. Did performance improve? Did storage decrease? Use this feedback to decide what to optimize next.

Remember: database optimization isn't about perfection. It's about continuous improvement. Start small, measure everything, and iterate. The 99% reduction we achieved didn't happen overnight. It was the result of many small optimizations over several months.

Your database doesn't have to be a performance bottleneck. With the right approach, it can become a well-oiled machine that supports your application efficiently for years to come. Start today—your future self will thank you.

Popular Articles

Building a Theme Picker for App Onboarding: A 2026 Developer Guide

How an AI Coding Bot Crashed Amazon: The December 2026 Outbreak

AWS AI Outages: When Automation Tools Break the Cloud

How We Reduced a 1.5GB Database by 99%: A 2025 Case Study

The 1.5GB Problem That Was Killing Our Performance

Understanding the JSON Blob Problem

The Normalization vs. Denormalization Dance

JSON Compression Techniques That Actually Work

API Integration: The Hidden Optimization Layer

Practical Steps You Can Take Today

Common Mistakes and How to Avoid Them

The Tools That Made It Possible

Beyond Storage: The Performance Benefits

Your Action Plan for 2025

Keep Reading

Building a Theme Picker for App Onboarding: A 2026 Developer Guide

How an AI Coding Bot Crashed Amazon: The December 2026 Outbreak

AWS AI Outages: When Automation Tools Break the Cloud

Lisa Anderson

Related Articles

Building a Theme Picker for App Onboarding: A 2026 Developer Guide

How an AI Coding Bot Crashed Amazon: The December 2026 Outbreak

AWS AI Outages: When Automation Tools Break the Cloud

Claude Code's 5K Issues: Why 'Coding is Solved' is a Myth

Building a Theme Picker for App Onboarding: A 2026 Developer Guide