Introduction: The Time Series Dilemma in 2026
You know the feeling. Your PostgreSQL instance is groaning under the weight of time series data—IoT sensor readings, application metrics, financial tick data—and you're watching query performance degrade with each passing day. The traditional approaches just aren't cutting it anymore. Partitioning helps, but it's a maintenance nightmare. Specialized time series databases solve some problems but introduce new ones around data silos and operational complexity.
What if you could keep PostgreSQL's rock-solid reliability and rich ecosystem while gaining the scalability of modern data lake architectures? That's exactly what the community has been exploring with Apache Iceberg. I've been testing these patterns for months now, and I can tell you: the results are genuinely exciting. This isn't just theoretical—teams are running this in production right now.
In this guide, we'll walk through building a high-performance time series stack that combines the best of both worlds. We'll address the specific questions and concerns raised in community discussions, share practical implementation patterns, and help you avoid the pitfalls that others have encountered.
Why PostgreSQL + Iceberg Makes Sense for Time Series
Let's start with the obvious question: why bother with this combination at all? PostgreSQL is a fantastic general-purpose database, but time series workloads have unique characteristics. You're dealing with massive volumes of append-heavy data, time-based queries, and often need to maintain data for compliance or historical analysis. Traditional PostgreSQL partitioning can handle this to a point, but managing partitions becomes a full-time job.
Apache Iceberg changes the game. It's a table format specification that brings database-like capabilities to object storage. What makes it particularly interesting for time series is its hidden partitioning, schema evolution, and time travel features. You get the scalability of a data lake with the usability of a database.
Here's the real kicker: you don't have to choose between PostgreSQL and Iceberg. You can use both. Keep your recent, hot data in PostgreSQL for low-latency queries and transactional integrity. Then, seamlessly offload older data to Iceberg tables on S3 or similar object storage. The community has been experimenting with this hybrid approach, and the feedback has been overwhelmingly positive. One engineer put it perfectly: "It feels like having your cake and eating it too—PostgreSQL's reliability with cloud-scale storage economics."
Architecture Patterns That Actually Work
Based on the discussions I've seen and my own testing, there are three main patterns emerging for PostgreSQL-Iceberg time series architectures. Each has its trade-offs, and the right choice depends on your specific workload.
The Tiered Storage Pattern
This is probably the most common approach teams are adopting. You keep, say, the last 30 days of data in PostgreSQL for real-time queries and dashboards. Everything older gets moved to Iceberg. The beauty here is transparency—applications can query across both storage layers without knowing where the data physically lives.
I've implemented this using PostgreSQL's foreign data wrappers (FDW) with the iceberg_fdw extension. It's not perfect—there are some performance considerations we'll discuss later—but it works surprisingly well. One team mentioned they reduced their PostgreSQL storage costs by 70% while maintaining sub-second query performance for recent data.
The Write-Through Pattern
Some applications write directly to both PostgreSQL and Iceberg simultaneously. This pattern makes sense when you have extremely high write volumes or need immediate availability of historical data for analytics. The challenge here is ensuring consistency between the two systems.
From what I've seen, the most successful implementations use a message queue or change data capture (CDC) approach. Debezium capturing PostgreSQL changes and streaming them to Iceberg has worked well for several teams. The key insight one engineer shared: "Don't try to make it perfectly synchronous. Embrace eventual consistency where you can."
The Analytics-First Pattern
This flips the traditional approach. You write time series data directly to Iceberg first, then sync summary or aggregated data back to PostgreSQL for operational queries. This works particularly well for IoT and monitoring use cases where raw data goes to Iceberg for long-term storage and analysis, while PostgreSQL holds aggregated metrics for dashboards.
A team monitoring industrial equipment told me they process 10TB of sensor data monthly this way. Their PostgreSQL instance handles the dashboard queries on pre-aggregated data, while data scientists query the raw Iceberg tables directly using Spark or Trino.
Implementation: Getting Your Hands Dirty
Alright, let's talk about actually building this. The community discussion raised some excellent questions about implementation details, so let me address those directly.
First, setting up Iceberg. You'll need a catalog (I prefer the REST catalog for its simplicity), object storage configured, and the Iceberg libraries. The PostgreSQL side needs the iceberg_fdw extension. Installation has gotten much easier in 2026—most package managers have up-to-date versions.
Here's a practical example of creating a hybrid time series table:
-- Create the Iceberg table
CREATE TABLE iceberg_catalog.metrics.raw_metrics (
device_id VARCHAR,
metric_name VARCHAR,
metric_value DOUBLE,
timestamp TIMESTAMP
) USING iceberg
PARTITIONED BY (days(timestamp));
-- Create the PostgreSQL table for recent data
CREATE TABLE recent_metrics (
device_id VARCHAR,
metric_name VARCHAR,
metric_value DOUBLE,
timestamp TIMESTAMP
) PARTITION BY RANGE (timestamp);
-- Create foreign table for Iceberg data
CREATE FOREIGN TABLE historical_metrics (
device_id VARCHAR,
metric_name VARCHAR,
metric_value DOUBLE,
timestamp TIMESTAMP
) SERVER iceberg_server
OPTIONS (table 'iceberg_catalog.metrics.raw_metrics');
Now for the magic: creating a view that unions both tables. Applications query this view, and PostgreSQL handles routing queries to the appropriate storage layer based on time predicates. One commenter asked about query performance—this is where it gets interesting. PostgreSQL's query planner is smart about partition pruning, but you need to help it with proper constraints and statistics.
Performance Optimizations That Matter
Performance questions dominated the community discussion, so let me share what actually works based on real testing.
First, partitioning strategy. Iceberg's hidden partitioning is fantastic, but you still need to design it thoughtfully. For time series, I've found that daily partitions work well for most use cases. Monthly partitions can get too large for efficient pruning, while hourly partitions create too many small files. The sweet spot seems to be daily partitions with Z-ordering on device_id or metric_name within each partition.
Second, file sizes matter more than you might think. Iceberg performs best with files between 256MB and 1GB. For time series data, this means you need to tune your compaction strategy. I've had good results with the rewrite_data_files procedure running daily, targeting 512MB files.
Third, caching. This was a big topic in the discussions. PostgreSQL's buffer cache won't help with Iceberg data, so you need to think about object storage caching. Some teams are using Alluxio or similar caching layers between PostgreSQL and S3. Others are implementing application-level caching for frequently queried historical data. My recommendation: start simple, measure your cache hit rates, and add complexity only when needed.
One engineer shared a brilliant optimization: they store summary statistics (min, max, avg) for each metric-day combination in a separate PostgreSQL table. Queries first check these statistics to determine if they need to scan the full Iceberg data. This reduced their Iceberg query volume by 90% for dashboard use cases.
Common Pitfalls and How to Avoid Them
Let's address the concerns raised in the community discussion head-on. These are the mistakes I've seen teams make—and how to avoid them.
Schema Evolution Headaches
Iceberg handles schema evolution beautifully, but there's a catch when integrating with PostgreSQL. If you add a column to your Iceberg table, your foreign table definition in PostgreSQL needs updating. The solution? Use views with explicit column lists rather than SELECT *. Or better yet, use a tool that can sync schema changes automatically. Several teams mentioned building simple schema sync scripts that run as part of their deployment pipeline.
Time Zone Confusion
This one bites everyone eventually. PostgreSQL and Iceberg may handle timestamps differently, especially around time zones. Be explicit: store everything in UTC, and convert only at display time. One team spent days debugging query inconsistencies before realizing their Iceberg tables were using local time while PostgreSQL used UTC.
Transaction Management
Iceberg isn't a transactional database in the ACID sense that PostgreSQL is. If you're writing to both systems, you need to think about failure scenarios. The pattern I recommend: write to PostgreSQL first, then asynchronously sync to Iceberg. If the Iceberg write fails, you can retry from the PostgreSQL data. Several commenters mentioned using CDC tools like Debezium for this exact reason—it provides at-least-once delivery semantics that work well for this use case.
Cost Surprises
Object storage is cheap, but API calls aren't free. If you're querying Iceberg tables frequently with poorly optimized queries, you can rack up surprising costs. One team mentioned their S3 API costs exceeded storage costs! The fix: implement query caching, use predicate pushdown effectively, and consider using a query engine like Trino that can optimize Iceberg queries better than PostgreSQL's FDW in some cases.
Tools and Ecosystem in 2026
The ecosystem around PostgreSQL and Iceberg has matured significantly. Here's what's available and actually useful.
For data movement, you have several good options. Airbyte and Meltano both have mature PostgreSQL and Iceberg connectors. For real-time sync, Debezium with the Iceberg sink connector works well. If you're building custom pipelines, the Iceberg Java and Python libraries are stable and well-documented.
Query engines worth considering: Trino remains the gold standard for querying Iceberg, but PostgreSQL's FDW has improved dramatically. For pure analytics workloads, I still prefer Trino. For operational queries that need to join Iceberg data with PostgreSQL data, the FDW approach works better.
Monitoring is crucial. You'll want to track file counts, partition sizes, query performance across both systems, and storage costs. The Iceberg REST catalog exposes metrics, and PostgreSQL's statistics views give you what you need on that side. Several teams mentioned building simple Grafana dashboards that combine metrics from both systems.
If you're dealing with scraping time series data from external sources—say, monitoring competitor pricing or collecting public sensor data—you might consider automating this with Apify. Their platform handles the scraping infrastructure, proxy rotation, and data normalization, letting you focus on the time series analysis rather than data collection headaches.
Real-World Use Cases and Lessons
Let me share some actual implementations I've seen or worked on, because nothing beats learning from real experience.
A fintech company processes market data this way. They ingest millions of ticks per minute into PostgreSQL for real-time trading algorithms. Every hour, data older than 24 hours moves to Iceberg. Their data science team queries years of historical data directly from Iceberg for backtesting models. The key lesson they shared: "Start with clear data ownership boundaries. Our trading team owns PostgreSQL, our data science team owns Iceberg queries."
An IoT platform monitors industrial equipment across hundreds of sites. They write sensor data directly to Iceberg, then aggregate it into 5-minute buckets stored in PostgreSQL for operational dashboards. When equipment fails, engineers can query the raw Iceberg data for detailed forensic analysis. Their insight: "Document your data retention policies clearly. We delete from PostgreSQL after 30 days but keep in Iceberg for 7 years. Everyone needs to understand this."
A SaaS company tracks application metrics. They use the tiered storage pattern with a twist: they keep multiple aggregates in PostgreSQL (1-minute, 5-minute, hourly) while the raw data goes to Iceberg. This gives them fast dashboard performance while maintaining the ability to drill down when needed. Their advice: "Invest in data quality checks. We validate data consistency between PostgreSQL and Iceberg daily."
Getting Started: Your Action Plan
Ready to implement this? Here's a practical roadmap based on what's worked for other teams.
Week 1: Set up your Iceberg catalog and object storage. Create a test Iceberg table and connect it to PostgreSQL using FDW. Get comfortable with basic queries.
Week 2: Choose one non-critical time series dataset to migrate. Implement the tiered storage pattern. Create the migration job that moves data from PostgreSQL to Iceberg.
Week 3: Test performance. Run your actual queries against the hybrid setup. Measure latency, throughput, and costs. Tune your partitioning and file sizes.
Week 4: Implement monitoring and alerting. Set up dashboards for file counts, query performance, and data freshness. Document your architecture and operational procedures.
If you need specialized help with any part of this—whether it's database optimization, Iceberg tuning, or building the data pipelines—consider hiring a database specialist on Fiverr. Sometimes bringing in targeted expertise for a few hours can save weeks of trial and error.
For those who want to dive deeper into the technical details, I recommend Designing Data-Intensive Applications. It provides excellent foundational knowledge that will help you make better architecture decisions.
Conclusion: The Future is Hybrid
The PostgreSQL + Iceberg combination for time series isn't just a theoretical exercise—it's a practical solution to real problems teams are facing in 2026. The community discussion shows both excitement and healthy skepticism, which is exactly what you want with new architectural patterns.
What I love about this approach is its pragmatism. You're not throwing away your PostgreSQL expertise or existing tooling. You're extending it with cloud-scale storage when you need it. The learning curve exists, but it's manageable. The operational complexity is real, but tools and patterns are maturing rapidly.
Start small. Pick one time series dataset that's causing you pain. Implement the tiered storage pattern. Measure everything. Learn from what works and what doesn't. The teams having the most success with this aren't the ones with perfect implementations—they're the ones who started experimenting early and iterated based on real usage.
Time series data will only grow in volume and importance. Having an architecture that can scale with that growth while maintaining query performance and cost efficiency isn't just nice to have—it's becoming essential. PostgreSQL and Iceberg together give you a path forward that's both practical and powerful.