Software Architect

Magdalena Jackiewicz

Editorial Expert

Reviewed by a tech expert

Engineering a scalable backend for a messaging app like WhatsApp – 4 key principles

#Sales

Read this articles in:

Table of contents

Example H2

Chat applications typically serve high and fluctuating numbers of concurrent users – between thousands and millions, and some of them will use your app more heavily than others. Providing a dependable chat functionality that is highly reliable and delights end-users is a complex challenge. Here are four key principles you should follow to ensure you engineer a real-time chat application backend that is scalable and doesn’t crash.

Principle 1: Ensure a reliable connection

When you are planning to build a large-scale chat app, the first thing you should take into account is providing a reliable connection to the end users. Messaging apps like WhatsApp, Slack or Discord handle hundreds of thousand to millions connections per node. That isn’t easy to achieve at that scale because of system limitations, network latency and inhomogeneous network between servers and clients.

Temporary disconnections happen especially for mobile devices where network problems are really common. When sudden network fluctuations happen, TCP connection can fall into a half-open state when one of the connection ends doesn't know that the other end is unavailable. How to tackle this issue?

Actions:

Use the geolocation of your users to connect them to the nearest datacenter to minimize latencies. Amazon Route 53 can be helpful at this - the DNS service provides geolocation routing that allows developers to allocate resources that serve the traffic based on users’ geographic location. With this DNS, you could easily reroute queries from a specific region to a load balancer located in a specific city.
Provide a gateway cluster (separately from application machines) which will be responsible for connection handling.
Implement connection failure discovery in your client and server application (e.g. keepalive or ping).
Implement some kind of message stream management to negotiate which incoming and outgoing messages didn’t reach the client and/or server due to connection problems (e.g. XEP-0198).

Principle 2: Ensure efficient message fanout

The main responsibility of the chat app backend is sending messages, presences and notifications between senders and receivers in real time. These could be one-to-one direct messages, one-to-many group messages, broadcasting presence changes to subscribers and/or some kind of notifications like chat states or message statuses.

WhatsApp handles ~20B inbounding and ~40B outbounding messages per day. To handle that number and provide reliability and better availability you have to split traffic to multiple servers. At this scale, ensuring that the servers communicate with each other to connect message senders and recipients that are likely logged on to different server instances (or multiple servers, as far as group messaging is concerned) or even different datacenters, is a real challenge. Here’s how to tackle it.

Actions:

Choose the right cluster topology depending on your use case. Full mesh (each node connected with all other nodes) is the most basic pattern but it doesn’t work well at large scale because of the large amount of server resources and high network traffic required to maintain multiple cluster connections. WhatsApp uses a pattern called „meta-clustering” which limits the size of any single cluster by splitting a cluster into functional groups and providing mesh connection only for these groups. You can use some kind of broker to reduce inter-node connections, e.g. combination of SQS and SNS or Kafka.
Create a cross cluster shared session registry to easily lookup the server to which a given session is connected. There are several ways to achieve that but we can split them into two groups:

coordinated (global): the coordinated approach is based on locking nodes during session registration. WhatsApp has a specialized cluster of nodes for session registry which uses a coordinated approach. Slack works in a similar way and uses a dedicated channel server cluster (everything is a channel in Slack) for mapping message receivers to specific session nodes. In your own application, you can easily start by using a distributed Redis cluster as a session registry which should be enough for small and medium scale.
eventual consistent: this approach uses algorithms like consistent hashing, conflict-free replicated data (CRDT) or Kademlia.

Minimize latencies on your hot patch by avoiding long-running, synchronous operations when processing messages. To guarantee low message latency and better reliability, it is crucial not to give too much overhead when handling messages.

Principle 3: Choose the right database

Data storage is another key aspect to take care of when building a real-time messaging app. Large chat apps handle hundreds of millions to billions messages per day and each of them should be stored and replicated globally to provide chat history for all interested users.

Most chat apps have an overall read/write ratio about 50/50, so it is not possible to optimize the database directly for writes or for reads. When selecting a database, you have to also consider requirements like: linear scalability, automatic failover, low maintenance, predictable performance and easy distribution around the cluster.

Actions:

Choose a database engine that fulfils your application requirements. If you operate at large scale you should probably consider one of Amazon’s Dynamo architecture based databases - DynamoDB, Cassandra, Riak or ScyllaDB. For instance, one of the big chat companies - Discord, was using Cassandra and then migrated to ScyllaDB. You can also use traditional RDBMS like MySql or Postgres but you should use an external layer to handle low latency cross datacenter replication. That kind of layer is provided by AuroraDB in AWS. On the other hand, Slack uses Vitess which is MySQL-compatible, cloud-native database.
Consider sharding your database by user_id, channel_id and/or date to lower the amount of data looked up in a single query. Cassandra, mentioned before, handles sharding automatically by mandatory partition key defined during table creation.
If you need message search functionality, you should probably have a separated database specialized in full text searching. The best solution here seems to be Elastic which is heavily used in Discord. We also used it to build the message search engine for our biggest client, Trans.eu

Principle 4: Prepare for traffic peaks

Running a large-scale application for millions of users carries the risk of temporary traffic peaks caused by more or less unpredictable circumstances or global events, such as Christmas Eve or New Year’s Eve.

When it comes to chat servers, the most resource-consuming events are high connection/disconnection rates which result in a large number of session initialization processes and then high number of outbounding presence packets. In addition, you can have large group chats with millions of subscribers in your chat system, where one message should be broadcasted to an enormous number of recipients. All of this can result in temporary overload of your application and your system should be prepared for this situation.

Actions:

Know your limits. You should stress test your application to know how much traffic it can handle and how well it scales.
Measure everything. Observability is crucial for large scale applications so try to measure as much as you can. This allow you to see when traffic peaks are coming and where are bottlenecks of your application or infrastructure
Configure your gateway to limit high connection/disconnection rate hitting your application
Consider autoscaling to be ready for traffic peaks while remaining cost effective (downscaling and upscaling may lead to increased reconnection rate which results in higher resource consumption). Typically, a gateway is used to solve this problem: the proxy takes care of keeping a stable connection, while a separate server stores the user’s session data. The key here is the session handoff. Once a server is ready to be switched off, it must hand off the session to another server to read the data. Once all the data from a server that is about to be deactivated has been reallocated to other servers, it can be securely switched off.

Building a scalable backend for your chat application

Hopefully, the key principles described in this article give you an idea that building a scalable chat application is marred with a number of challenges. We’ve listed the principles we consider crucial in the process, however, the list isn’t exhaustive. There are more issues to be overcome and more actions you can take to tackle them, all of which will depend on the nature and requirements of your project. Owners looking to release a new chat app in the nearest future should also be aware of the trends that are currently shaping the industry.

If you consider building a new chat application from scratch and think we can help you throughout the process, contact me directly at magda@rst.software and I’d be happy to connect you with one of our real-time messaging app specialists who will discuss your exact requirements.

Engineering a scalable backend for a messaging app like WhatsApp – 4 key principles

Principle 1: Ensure a reliable connection

Actions:

Principle 2: Ensure efficient message fanout

Actions:

Principle 3: Choose the right database

Actions:

Principle 4: Prepare for traffic peaks

Actions:

Building a scalable backend for your chat application

_Sources:

People also ask

Want to read more?

How to train a chatbot?

Why modern businesses need integrated customer messaging platforms?

Get rid of the communication mess! Building a tailored unified messaging platform

Services

Portfolio

eBooks

Blog

About us

open-source