High-Level System Architecture of Real-Time Chat Applications
High-Level System Architecture of Real-Time Chat Applications
Hello everyone! In this article, we will take an in-depth look at the possible high-level architecture of real-time messaging giants like WhatsApp and Facebook Messenger.

Imagine a system that serves over 2 billion active users, processes 100 billion messages per day, and is expected to deliver every single one of them in milliseconds. This is not just an app; it is a masterpiece of distributed systems engineering.
Whether you are a software architect, a backend engineer, or a tech enthusiast, understanding how these systems scale to handle such massive concurrency is fascinating. In this paper, we will dissect the technological processes behind such a platform and propose a basic high-level architecture.
Introduction
Serving billions of users worldwide, chat applications require a dynamic system architecture to meet the user’s expectation of “instant” communication. Unlike standard web applications where a simple HTTP Request/Response model suffices, chat apps require persistent connections and real-time data flow.
The purpose of this article is to discuss how such a platform is designed, how its main components function, and how it handles the C10M problem (handling 10 million concurrent connections).
System Requirements
The system architecture of a global chat application is crafted to accommodate extensive demands. When designing a system like WhatsApp, we must satisfy specific functional and non-functional requirements.
Non-Functional Requirements (The Engineering Challenges):
- Low Latency: This is the most critical factor. Users expect real-time delivery. A delay of even a few seconds is noticeable and annoying.
- High Availability: The system must be available 99.999% of the time. In the context of the CAP Theorem, chat applications usually prioritize Availability and Partition Tolerance (AP), though consistency is eventually required (Eventual Consistency).
- High Scalability: The system must handle traffic spikes (e.g., New Year’s Eve) without degrading performance.
- Reliability: We cannot lose data. The system must guarantee at-least-once delivery.
High-Level Architecture
Possible High-Level Architecture of a Chat Application

The diagram above represents a visual summary of the service architecture. To better understand the role and functionality of each component, we will discuss the key components individually below.
1. Connection Management & Protocols
This is where chat apps differ significantly from standard websites.
Why HTTP is not enough:
In a standard web app, the client asks for data, and the server responds. However, in a chat app, the server must be able to push a message to the client without the client asking for it. Using HTTP Polling (asking the server “Do I have new messages?” every second) is incredibly inefficient and drains the user’s battery.
The Solution: Persistent Sockets
- WebSocket / Long-Lived TCP: The client establishes a persistent connection with the server. This “tunnel” stays open, allowing bi-directional communication.
- Protocols:
- XMPP (Extensible Messaging and Presence Protocol): WhatsApp uses a heavily customized, compressed binary version of XMPP to minimize data usage.
- MQTT (Message Queuing Telemetry Transport): Facebook Messenger often utilizes MQTT, a lightweight protocol designed for IoT, which is excellent for saving battery and bandwidth on mobile devices.
2. The Chat Gateway (The Entry Point)
The Chat Gateway is a cluster of servers that maintain these open connections. This is the most resource-intensive part of the system regarding RAM and CPU concurrency.
The Erlang Advantage:
It is widely known that WhatsApp built its infrastructure on Erlang (and the BEAM VM). Erlang is designed for telecommunications; it is lightweight and handles concurrency exceptionally well.
By optimizing Erlang, WhatsApp engineers were famously able to handle over 2 million concurrent TCP connections on a single server box. [1]
3. Database Architecture
How do you store billions of messages a day? A standard SQL database (like PostgreSQL or MySQL) will struggle to handle the write throughput required at this scale without massive complexity.
The Choice: NoSQL & Wide-Column Stores
The nature of chat data is unique:
- Read/Write Ratio: It is almost 50/50 (unlike Instagram, which is read-heavy).
- Access Pattern: Users mostly look at recent chats. Old chats are rarely accessed.
Therefore, databases like Apache Cassandra or HBase are the standard choices.
- Facebook Messenger historically used HBase.
- Discord shifted from Cassandra to ScyllaDB to reduce latency [2].
Sharding (Partitioning):
To scale, the database is horizontally split into “shards.”
- Partition Key: Usually
UserIDorConversationID. This ensures that all messages for a specific chat reside on the same database node, making retrieval fast.
4. Message Queues (Asynchronous Processing)
When User A sends a message to User B, we don’t just write it to the database immediately. We pass it through a Message Queue.
Kafka is the industry standard here. It acts as a buffer. If the database is experiencing high load, Kafka holds the messages until the database catches up, preventing data loss. It decouples the message ingestion service from the message processing service.
5. Media Storage (Images & Videos)
We never store binary files (images/videos) in the primary database. It ruins performance.
The Flow:
- The App compresses the image and uploads it to an Object Store (like Amazon S3).
- The Object Store returns a URL (e.g.,
https://cdn.chat.com/img123.jpg). - The chat message sent to the recipient contains only this URL, not the image itself.
- CDN (Content Delivery Network): As mentioned in the Booking.com architecture, CDNs are used to cache this content on servers geographically closer to the user for faster download speeds.
Communication Flow: Sending a Message
Let’s trace the journey of a message from Alice to Bob.
- Establish Connection: Alice’s phone maintains an open WebSocket connection to the Chat Gateway.
- Send: Alice sends a message (“Hello”). The Gateway receives it.
- Process: The Gateway pushes the message to a Kafka topic.
- Persist: A “Persist Service” reads from Kafka and saves the message to the Cassandra database.
- Route: A “Routing Service” checks Redis (Cache) to see if Bob is currently Online.
- Scenario A (Bob is Online): The service identifies which Gateway server Bob is connected to and pushes the message instantly via his open socket.
- Scenario B (Bob is Offline): The service triggers a request to the Notification Service, which sends a Push Notification via Apple (APNS) or Google (FCM).
Global Request Management
Just like Booking.com uses HAProxy, chat apps use global Load Balancers to route traffic. If a user is in Tokyo, DNS routing ensures they connect to the nearest Data Center in Asia, not a server in Virginia. This minimizes network latency.
Conclusion
In conclusion, we have examined the high-level architecture of massive chat applications. The secret sauce lies not just in the code, but in the architectural decisions: using Persistent Connections instead of HTTP, choosing NoSQL for write-heavy workloads, and leveraging Erlang or Go for high concurrency.
As with any technological review, these architectures evolve. Thank you very much for taking your valuable time to read my article. I hope this has deepened your understanding of how systems like WhatsApp work under the hood.
References
[1] R. Reed, “Scaling to Millions of Simultaneous Connections,” Erlang Factory SF Bay Area, 2014. [Video].
[2] B. Hiltpolt, “How Discord Stores Billions of Messages,” Discord Engineering Blog, 2017.
[3] A. Xu, System Design Interview: An Insider’s Guide, 2020.
[4] “Building Mobile-First Infrastructure for Messenger,” Facebook Engineering, 2014. https://engineering.fb.com/
← PostgreSQL Blog