How PostgreSQL Is Scaled to Serve 800 Million ChatGPT Users
How PostgreSQL Is Scaled to Serve 800 Million ChatGPT Users
A few days ago, I read the OpenAI engineering article written by Bohan Zhang titled Scaling PostgreSQL to 800 million ChatGPT users. I will be honest: the architecture described in the article challenged many of my assumptions about database scalability.

In modern system design discussions, once traffic reaches the scale of ChatGPT, the conversation usually ends quickly. Everything must be sharded. Relational databases become a liability. Eventually, the conclusion is to move entirely to distributed NoSQL systems.
Yet OpenAI did something that feels almost counter-cultural in today’s engineering climate. At the heart of one of the fastest-growing applications in history, they still run PostgreSQL with a single primary writer, and they manage millions of queries per second without catastrophic bottlenecks.
This article is not a rewrite of OpenAI’s post. It is a technical reflection on what they built, why it works, where its limits are, and what lessons senior engineers can realistically take from it.
Reference: Scaling PostgreSQL to 800 million ChatGPT users — OpenAI Engineering Blog
The First Reaction: This Should Not Work
If you describe this architecture without context, it sounds fragile.
- A single primary node handling all writes
- Dozens of read replicas
- PostgreSQL, not a custom distributed storage engine
- No universal sharding strategy
On paper, this looks like a system waiting to collapse under write contention and replication pressure. In practice, it does not. The key insight is that OpenAI did not try to force PostgreSQL to be something it is not.
They reshaped the workload around the database, not the other way around.
A Clear Definition of PostgreSQL’s Role
One of the most important architectural decisions is not technical, but organizational.
PostgreSQL is no longer treated as a general-purpose sink for all data. It has a clearly defined responsibility.
PostgreSQL is used only when:
- Strong relational guarantees are required
- Transactions and consistency matter more than raw write throughput
- The data model is difficult or dangerous to shard
PostgreSQL is explicitly avoided when:
- Write volume is high and horizontally partitionable
- Event-style or append-heavy data is involved
- Eventual consistency is acceptable
Write-heavy and shard-friendly workloads are moved to systems like Azure Cosmos DB. This decision alone removes the primary cause of failure for large PostgreSQL deployments: uncontrolled write amplification.
Why a Single Primary Is Still Viable
PostgreSQL’s single-writer limitation is real. MVCC creates multiple row versions, increases WAL traffic, and demands continuous vacuuming. At scale, this can become painful.
OpenAI does not eliminate this bottleneck. They bound it.
What makes the single primary sustainable
- Write volume is aggressively capped
- No new write-heavy tables are added casually
- Schema evolution is tightly controlled
- The primary does not serve read traffic
Instead of pretending PostgreSQL can scale writes infinitely, they engineered the system so it never has to try.
Read Scaling Done the Right Way
Read traffic is where this architecture truly shines.
OpenAI operates more than 50 read replicas to serve global traffic. Latency-sensitive queries are routed geographically, and the primary is shielded from read amplification.
However, this introduces another problem: replication overhead.
Each replica consumes WAL data. As replica count grows, the primary risks becoming a replication bottleneck.
Cascading Replication as a Practical Extension
To avoid overwhelming the primary, OpenAI uses cascading replication.
Some replicas receive WAL directly from the primary, while others replicate from those replicas instead. This transforms replication cost from linear to hierarchical.
Benefits
- The primary’s CPU and network usage remain stable
- Replica count can scale well beyond typical PostgreSQL limits
- Read capacity increases without write-side penalties
This is a pragmatic extension of native PostgreSQL features rather than a radical redesign.
Workload Isolation and the Noisy Neighbor Problem
One of the most realistic parts of the architecture is how it handles human error.
In large organizations, someone will eventually write a bad query.
OpenAI assumes this and designs around it.
Isolation strategies
- High-priority queries are routed to protected instances
- Low-priority or experimental workloads are isolated
- Query routing rules are enforced at the application layer
This ensures that one inefficient query cannot degrade the entire platform.
Connection Pooling Is Not Optional at This Scale
At millions of requests per second, database connections themselves become a scaling problem.
PostgreSQL cannot handle tens of thousands of concurrent connections efficiently. OpenAI relies heavily on PgBouncer to collapse connection counts and stabilize latency.
Key outcomes
- Orders-of-magnitude fewer active connections
- Faster request handling
- Protection against connection storms during traffic spikes
This is not an optimization. It is a requirement.
Cache Miss Storms and Defensive Caching
Caching is often discussed as a performance optimization. At this scale, it is a safety mechanism.
A single hot cache key expiring can generate thousands of concurrent database hits. OpenAI mitigates this using request-level locking and leasing.
Only one request repopulates the cache. Others wait. The database load remains bounded even during sudden bursts.
Simple idea, massive impact.
Schema Changes Treated as Production Incidents
Schema migrations are one of the most underestimated risks in large systems.
OpenAI treats them with zero tolerance.
Rules
- Schema changes must complete within five seconds
- Table rewrites are forbidden
- Indexes must be created concurrently
- Long-running DDL is canceled automatically
These constraints force engineers to design safer schemas upfront and avoid risky changes later.
Strengths of This Approach
Major advantages
- High availability with familiar, battle-tested technology
- Strong consistency where it actually matters
- Lower operational complexity than fully sharded systems
- Easier debugging and data correctness guarantees
This is boring engineering in the best possible way.
Limitations and Trade-offs
This architecture is not universally applicable.
Real constraints
- Write throughput is fundamentally limited
- Requires strong organizational discipline
- Heavy governance around schema and data ownership
- Not suitable for workloads with unbounded writes
If your system is fundamentally write-heavy, this model will not save you.
Final Thoughts: What This Architecture Really Teaches
The most important lesson from OpenAI’s PostgreSQL scaling story is not about PostgreSQL itself.
It is about restraint.
Instead of reaching for more complex technologies, OpenAI constrained usage, enforced discipline, and built protective layers around a well-understood system.
PostgreSQL did not magically scale to 800 million users. The workload was engineered so that it never asked PostgreSQL to do the wrong job.
Before sharding everything or abandoning relational databases, it is worth asking:
Are we hitting real database limits, or are we simply using the database without discipline?
Reference:
Scaling PostgreSQL to 800 million ChatGPT users OpenAI Engineering Blog
← PostgreSQL Blog