Chat applications typically serve high and fluctuating numbers of concurrent users – between thousands and millions, and some of them will use your app more heavily than others. Providing a dependable chat functionality that is highly reliable and delights end-users is a complex challenge. Here are four key principles you should follow to ensure you engineer a real-time chat application backend that is scalable and doesn’t crash.
Principle 1: Ensure a reliable connection
When you are planning to build a large-scale chat app, the first thing you should take into account is providing a reliable connection to the end users. Messaging apps like WhatsApp, Slack or Discord handle hundreds of thousand to millions connections per node. That isn’t easy to achieve at that scale because of system limitations, network latency and inhomogeneous network between servers and clients.
Temporary disconnections happen especially for mobile devices where network problems are really common. When sudden network fluctuations happen, TCP connection can fall into a half-open state when one of the connection ends doesn't know that the other end is unavailable. How to tackle this issue?
- Use the geolocation of your users to connect them to the nearest datacenter to minimize latencies. Amazon Route 53 can be helpful at this - the DNS service provides geolocation routing that allows developers to allocate resources that serve the traffic based on users’ geographic location. With this DNS, you could easily reroute queries from a specific region to a load balancer located in a specific city.
- Provide a gateway cluster (separately from application machines) which will be responsible for connection handling.
- Implement connection failure discovery in your client and server application (e.g. keepalive or ping).
- Implement some kind of message stream management to negotiate which incoming and outgoing messages didn’t reach the client and/or server due to connection problems (e.g. XEP-0198).
Principle 2: Ensure efficient message fanout
The main responsibility of the chat app backend is sending messages, presences and notifications between senders and receivers in real time. These could be one-to-one direct messages, one-to-many group messages, broadcasting presence changes to subscribers and/or some kind of notifications like chat states or message statuses.
WhatsApp handles ~20B inbounding and ~40B outbounding messages per day. To handle that number and provide reliability and better availability you have to split traffic to multiple servers. At this scale, ensuring that the servers communicate with each other to connect message senders and recipients that are likely logged on to different server instances (or multiple servers, as far as group messaging is concerned) or even different datacenters, is a real challenge. Here’s how to tackle it.
Principle 3: Choose the right database
Data storage is another key aspect to take care of when building a real-time messaging app. Large chat apps handle hundreds of millions to billions messages per day and each of them should be stored and replicated globally to provide chat history for all interested users.
Most chat apps have an overall read/write ratio about 50/50, so it is not possible to optimize the database directly for writes or for reads. When selecting a database, you have to also consider requirements like: linear scalability, automatic failover, low maintenance, predictable performance and easy distribution around the cluster.
- Choose a database engine that fullfils your application requirements. If you operate at large scale you should probably consider one of Amazon’s Dynamo architecture based databases - DynamoDB, Cassandra, Riak or ScyllaDB. For instance, one of the big chat companies - Discord, was using Cassandra and then migrated to ScyllaDB. You can also use traditional RDBMS like MySql or Postgres but you should use an external layer to handle low latency cross datacenter replication. That kind of layer is provided by AuroraDB in AWS. On the other hand, Slack uses Vitess which is MySQL-compatible, cloud-native database.
- Consider sharding your database by user_id, channel_id and/or date to lower the amount of data looked up in a single query. Cassandra, mentioned before, handles sharding automatically by mandatory partition key defined during table creation.
- If you need message search functionality, you should probably have a separated database specialized in full text searching. The best solution here seems to be Elastic which is heavily used in Discord. We also used it to build the message search engine for our biggest client, Trans.eu
Principle 4: Prepare for traffic peaks
Running a large-scale application for millions of users carries the risk of temporary traffic peaks caused by more or less unpredictable circumstances or global events, such as Christmas Eve or New Year’s Eve.
When it comes to chat servers, the most resource-consuming events are high connection/disconnection rates which result in a large number of session initialization processes and then high number of outbounding presence packets. In addition, you can have large group chats with millions of subscribers in your chat system, where one message should be broadcasted to an enormous number of recipients. All of this can result in temporary overload of your application and your system should be prepared for this situation.
- Know your limits. You should stress test your application to know how much traffic it can handle and how well it scales.
- Measure everything. Observability is crucial for large scale applications so try to measure as much as you can. This allow you to see when traffic peaks are coming and where are bottlenecks of your application or infrastructure
- Configure your gateway to limit high connection/disconnection rate hitting your application
- Consider autoscaling to be ready for traffic peaks while remaining cost effective (downscaling and upscaling may lead to increased reconnection rate which results in higher resource consumption). Typically, a gateway is used to solve this problem: the proxy takes care of keeping a stable connection, while a separate server stores the user’s session data. The key here is the session handoff. Once a server is ready to be switched off, it must hand off the session to another server to read the data. Once all the data from a server that is about to be deactivated has been reallocated to other servers, it can be securely switched off.
Building a scalable backend for your chat application
Hopefully, the key principles described in this article give you an idea that building a scalable chat application is marred with a number of challenges. We’ve listed the principles we consider crucial in the process, however, the list isn’t exhaustive. There are more issues to be overcome and more actions you can take to tackle them, all of which will depend on the nature and requirements of your project. Owners looking to release a new chat app in the nearest future should also be aware of the trends that are currently shaping the industry.
If you consider building a new chat application from scratch and think we can help you throughout the process, contact me directly at email@example.com and I’d be happy to connect you with one of our real-time messaging app specialists who will discuss your exact requirements.
- Using rust to scale elixir for 11 million concurrent users
- How Discord Scaled Elixir to 5,000,000 Concurrent Users
- How discord stores billions of messages
- How discord indexes billions of messages
- Scaling Slack - The Good, the Unexpected, and the Road Ahead
- How Slack Works
- That's 'Billion' with a 'B': Scaling to the Next Level at WhatsApp
- A Reflection on Building the WhatsApp Server
- Scaling Erlang cluster to 10,000 nodes