If you have been working in web development for any sizable period of time, chances are you have been asked to make sure that the application you are building is scalable. Building a web application that performs well at any scale is becoming a more common expectation of business owners around the world today.
But, what is scalability?
The word scalable is often (wrongly) used to mean that the application should work well (respectable response times) at a high load. In truth, that is a high performance application. You could throw a lot of resources at any (even a poorly designed) application and make it perform well. A scalable application is one that can accomodate the increase (or decrease) in the load without any downtime. An application might perform perfectly at the current load but if it is not able to keep up with an increase in load it can not be called scalable. The two concepts are closely related, but not interchangeable.
So, how can we build high performance applications?
Thinking about performance from the start is a great way to make sure you don’t find yourself in a tough spot halfway through. Some of the questions that you should ask at this stage are,
1) How much effort do you want to put in for improving performance?
2) What functionalities would be used most commonly?
3) What are the most resource intensive tasks?
There are a few design guidelines you can follow to make sure your application performs well.
Databases are probably the most common point for failures for most applications. Of course, each application is different and if you are doing tons of image or any sort of file manipulations then that might become a bottleneck even before the database shows any sign of problems.
Implementing a cache can reduce the load on your database drastically, if your application is making the same queries over and over. There are different types of caches, for example you could use a proxy cache and serve common HTTP requests even without touching your application server. You could also use an object cache to store processed data received from database queries. The key thing to focus on here is that you should only cache things that have reuse value. If you store something in a cache that’s only going to be used once after that, it’s not going to be a very major optimization.
While working on a custom CMS for a website that got more than a million visitors a day, we implemented a simple proxy cache and it brought the CPU load from 70% down to 10%.
So, think about a caching plan while you are designing the application and you will definitely see a lot of improvements.
Asynchronous and background workers
Every application has some heavy lifting sections and some light ones. For example a logout request might not be doing much other than invalidating the session token. On the other hand consider an application that allows users to connect all their social profiles, import their data and process it to show some reports. Doing this would involve making multiple network requests depending on the number of profiles and the amount of data, and because these are external services you have no control over their response times. Performing this operation in realtime would put a lot of unnecessary load on your application. A better way of doing this would be to have a script running in the background that periodically checks for new requests, does all the heavy lifting and notifies your application once the operation is complete. Message queues are commonly used in such situations, but even a simple script running in the background would be better than doing such operations in realtime.
Whenever in doubt about a heavy operation, just ask this question. Would the end user have any problem if this operation is completed a minute later?
Build a decoupled architecture
All complex applications are made up of many different components. For example, there might be a video processing component that only works when a user uploads a new video and generates files in different formats/sizes. Or there might be a reporting component that fetches a lot of data, performs operations on it and generates some reports. Building your application in a decoupled manner will allow you to allocate proper resources to the components that need them, when they need them.
Similarly there might be a lot of different services as well. There is a web server serving your application files, there is a database, there might (should) be a cache, there might be a message queue. All these different services have different requirements from the system. If you run all of them on a single machine it would be very difficult to identify which services need more resources. For example, if the amount of data increases the database and the cache would require more RAM but the web server can still operate with the same resources. Having all the services running on different machines (or VPS) would make it a lot easier to manage and allocate the required resources.
Build Stateless services
Building stateless services is important because once you are ready to scale your application you need services that can be started or stopped without any impact on the overall system. Scaling out a stateful service is comparably a lot more work. A common example of this is, let us say you have user session information stored on your application server. Now if the load increases and you decide to add more servers to handle the increased load, you will also have to make sure that the session information is synced across all the servers, or that a given user always goes to the same server. Both of which are additional overheads that can be avoided if you build stateless services. There are some services that are inherently stateful, like databases and scaling them is always more complicated than their stateless counterparts.
Build fault tolerant systems
Know that things will go wrong. Believing that every part of your system can and probably will fail is important, because it forces us to think about the steps that can be taken to mitigate the issue as much as possible. Ask yourself questions like these,
How much downtime can the application afford?
This is going to be different for each application. A system like cloudflare that sits in front of millions of applications can afford very little (if any) dowtime, on the other hand a blog can afford a lot more.
What irreparable damages can happen?
What happens if your database fails? Do you have backups? How many copies? When was the last backup taken? Can you afford to lose the data created/updated after that point? What happens if your backup server fails? Do you have backups on multiple servers?
Understand that fault tolerance comes with a price. You need to decide how much cost you can afford based on the value.
This is in no way a complete list. Experts far more knowledgeable than me have written books that cover just this topic, but this should get you started.
Once you have built a system that performs well, you can start thinking about making it scalable.
How scalable do you want the system to be?
Scalability has a price, it takes time and effort. If you are just building a prototype there is (probably) no point in making it scalable up to a million users. Most applications like facebook or twitter have been rewritten multiple times.
If you are building a system for a client, let them know that it would add to the cost of the project. There is going to be a big difference in the cost of a system that can serve 100 realtime users versus one that can serve a million. Keep this transparent from the start because if the application gets a lot of traction and the system is not able to handle the load, you are going to be in a tough spot.
A scalable system is one that keeps performing well even as the number of users increases. That would involve adding/ removing servers based on the traffic. The first step for this would be knowing when to scale.
Monitoring all the systems/components manually is going to be very difficult if not impossible. You need to build dashboards that can provide you with a drillable overview of the system’s current/historical state. You will also need to set up alarms that can be triggered when the load reaches a particular level. Doing this is far easier if you are using a cloud provider like AWS versus if you are implementing it on your own using open source softwares. For AWS you can use Cloud Watch and implement auto-scaling. If you are doing it on your own you can use something like Nagios to setup the dashboard and use scripts to create new servers and add them to the load balancer.
If you are using AWS make sure that you have added a spending limit on your account or might wake up to a 10,000$ bill one morning.
Use a load balancer
If you follow the ideas mentioned above and build a stateless system, it is very straightforward to add a load balancer that can direct the load to different servers. You can keep on adding and removing the application servers based on the load. And if you built a decoupled system, you can choose to scale just the components that are experiencing heavy load.
Cost of scalability
You need to take a call whether you need a scalable system right now. There are some basal costs associated with building a scalable system. If your entire application can run on a 5$ VPS at your current load, dividing it on different servers is going to bring the cost up. If you just need one application server right now, putting a load balancer in front of it is going to double the cost. Ideally, you should run load tests and check the limit of your system - preferably in a staging environment and plan accordingly.
To sum it up, if your system is built with a focus on performance, making it scalable is going to be a lot easier in comparison. The architecture of the application is what makes it easier or difficult to scale. There are costs associated with a scalable system, and you should discuss them with the business owner or the client.
All this is barely scratching the surface. But it can be an entry point for someone who is starting to focus on scalability and performance.