How PostgreSQL Is Scaled to Serve 800 Million ChatGPT Users

A few days ago, I read the OpenAI engineering article written by Bohan Zhang titled Scaling PostgreSQL to 800 million ChatGPT users. I will…

How PostgreSQL Is Scaled to Serve 800 Million ChatGPT Users

A few days ago, I read the OpenAI engineering article written by Bohan Zhang titled Scaling PostgreSQL to 800 million ChatGPT users. I will be honest: the architecture described in the article challenged many of my assumptions about database scalability.

In modern system design discussions, once traffic reaches the scale of ChatGPT, the conversation usually ends quickly. Everything must be sharded. Relational databases become a liability. Eventually, the conclusion is to move entirely to distributed NoSQL systems.

Yet OpenAI did something that feels almost counter-cultural in today’s engineering climate. At the heart of one of the fastest-growing applications in history, they still run PostgreSQL with a single primary writer, and they manage millions of queries per second without catastrophic bottlenecks.

This article is not a rewrite of OpenAI’s post. It is a technical reflection on what they built, why it works, where its limits are, and what lessons senior engineers can realistically take from it.

Reference: Scaling PostgreSQL to 800 million ChatGPT users — OpenAI Engineering Blog

The First Reaction: This Should Not Work

If you describe this architecture without context, it sounds fragile.

A single primary node handling all writes
Dozens of read replicas
PostgreSQL, not a custom distributed storage engine
No universal sharding strategy

On paper, this looks like a system waiting to collapse under write contention and replication pressure. In practice, it does not. The key insight is that OpenAI did not try to force PostgreSQL to be something it is not.

They reshaped the workload around the database, not the other way around.

A Clear Definition of PostgreSQL’s Role

One of the most important architectural decisions is not technical, but organizational.

PostgreSQL is no longer treated as a general-purpose sink for all data. It has a clearly defined responsibility.

PostgreSQL is used only when:

Strong relational guarantees are required
Transactions and consistency matter more than raw write throughput
The data model is difficult or dangerous to shard

PostgreSQL is explicitly avoided when:

Write volume is high and horizontally partitionable
Event-style or append-heavy data is involved
Eventual consistency is acceptable

Write-heavy and shard-friendly workloads are moved to systems like Azure Cosmos DB. This decision alone removes the primary cause of failure for large PostgreSQL deployments: uncontrolled write amplification.

Why a Single Primary Is Still Viable

PostgreSQL’s single-writer limitation is real. MVCC creates multiple row versions, increases WAL traffic, and demands continuous vacuuming. At scale, this can become painful.

OpenAI does not eliminate this bottleneck. They bound it.

What makes the single primary sustainable

Write volume is aggressively capped
No new write-heavy tables are added casually
Schema evolution is tightly controlled
The primary does not serve read traffic

Instead of pretending PostgreSQL can scale writes infinitely, they engineered the system so it never has to try.

Read Scaling Done the Right Way

Read traffic is where this architecture truly shines.

OpenAI operates more than 50 read replicas to serve global traffic. Latency-sensitive queries are routed geographically, and the primary is shielded from read amplification.

However, this introduces another problem: replication overhead.

Each replica consumes WAL data. As replica count grows, the primary risks becoming a replication bottleneck.

Cascading Replication as a Practical Extension

To avoid overwhelming the primary, OpenAI uses cascading replication.

Some replicas receive WAL directly from the primary, while others replicate from those replicas instead. This transforms replication cost from linear to hierarchical.

Benefits

The primary’s CPU and network usage remain stable
Replica count can scale well beyond typical PostgreSQL limits
Read capacity increases without write-side penalties

This is a pragmatic extension of native PostgreSQL features rather than a radical redesign.

Workload Isolation and the Noisy Neighbor Problem

One of the most realistic parts of the architecture is how it handles human error.

In large organizations, someone will eventually write a bad query.

OpenAI assumes this and designs around it.

Isolation strategies

High-priority queries are routed to protected instances
Low-priority or experimental workloads are isolated
Query routing rules are enforced at the application layer

This ensures that one inefficient query cannot degrade the entire platform.

Connection Pooling Is Not Optional at This Scale

At millions of requests per second, database connections themselves become a scaling problem.

PostgreSQL cannot handle tens of thousands of concurrent connections efficiently. OpenAI relies heavily on PgBouncer to collapse connection counts and stabilize latency.

Key outcomes

Orders-of-magnitude fewer active connections
Faster request handling
Protection against connection storms during traffic spikes

This is not an optimization. It is a requirement.

Cache Miss Storms and Defensive Caching

Caching is often discussed as a performance optimization. At this scale, it is a safety mechanism.

A single hot cache key expiring can generate thousands of concurrent database hits. OpenAI mitigates this using request-level locking and leasing.

Only one request repopulates the cache. Others wait. The database load remains bounded even during sudden bursts.

Simple idea, massive impact.

Schema Changes Treated as Production Incidents

Schema migrations are one of the most underestimated risks in large systems.

OpenAI treats them with zero tolerance.

Rules

Schema changes must complete within five seconds
Table rewrites are forbidden
Indexes must be created concurrently
Long-running DDL is canceled automatically

These constraints force engineers to design safer schemas upfront and avoid risky changes later.

Strengths of This Approach

Major advantages

High availability with familiar, battle-tested technology
Strong consistency where it actually matters
Lower operational complexity than fully sharded systems
Easier debugging and data correctness guarantees

This is boring engineering in the best possible way.

Limitations and Trade-offs

This architecture is not universally applicable.

Real constraints

Write throughput is fundamentally limited
Requires strong organizational discipline
Heavy governance around schema and data ownership
Not suitable for workloads with unbounded writes

If your system is fundamentally write-heavy, this model will not save you.

Final Thoughts: What This Architecture Really Teaches

The most important lesson from OpenAI’s PostgreSQL scaling story is not about PostgreSQL itself.

It is about restraint.

Instead of reaching for more complex technologies, OpenAI constrained usage, enforced discipline, and built protective layers around a well-understood system.

PostgreSQL did not magically scale to 800 million users. The workload was engineered so that it never asked PostgreSQL to do the wrong job.

Before sharding everything or abandoning relational databases, it is worth asking:

Are we hitting real database limits, or are we simply using the database without discipline?

Reference:

Scaling PostgreSQL to 800 million ChatGPT users OpenAI Engineering Blog