Rapid advancements in technology have led to an explosive growth in data and user traffic. Therefore, it is of paramount interest that an application's infrastructure can scale seamlessly with the growth of users and increase in data volume.
How do you create a codebase that adopts changes gracefully alongside a project as it grows? The more complex a project becomes, the more difficult it is to address capacity limits, bandwidth issues, and drops in performance.
Thoughtful design decisions can set a solid foundation for software systems to be highly scalable, a quality that is often critical to its long-term success.
As you gain more experience as a developer, you'll find that system design decisions can significantly impact the problems you face. Design decisions, software choices, and IT infrastructure can make software engineering easier or contribute to the problems you're trying to solve. Over time, you may develop a more intuitive feel for what software systems are scalable. Still, you will eventually need to know how to identify and understand different properties of software systems and how those properties can extend or shorten their lifespan.
In the first half of this article, we'll talk about why scalability is important, the mechanisms for achieving scalability, and the dimensions of scalability. In the second half, we'll go over areas where you can identify performance bottlenecks and methods to test and optimize your system. Afterward, we'll conclude with a few useful resources to further enhance your knowledge in this area.
Let's dive right in!
We'll cover:
- What is scalability?
- Mechanisms to achieve scalability
- Dimensions of scalability
- Identifying performance bottlenecks
- Testing the performance of your application
- Testing the scalability of your application
- Wrapping up and next steps
What is scalability?
The concept of scalability refers to the desirable attribute of a system, network, or process to accommodate increasing numbers of elements or objects, handle growing volumes of work efficiently, and have room for expansion[1].
In other words, a system is scalable when an influx of users, traffic volume, or bandwidth usage does not impact its performance or functional requirements.
Latency vs bandwidth vs throughput
Latency is the time it takes for a system to respond to a user's request.
There are two main types of latency: Network latency is the time it takes for a network to send a data packet from point A to point B. Application latency is the time an application takes to process a user request.
Businesses cut down on network latency by using content delivery networks (CDN) to deploy their servers nearest the end-user. These sites are called edge locations. One solution for minimizing application latency is to run stress and load tests to scan for bottlenecks in the system.
Bandwidth is the maximum capacity of a network or computing system to transmit data over a specific period of time.
Throughput measures the amount of data that a network or computing system successfully sends or receives over a period of time.
Note: Suppose we think of data packets as being similar to cars on a highway. In that case, you can think of latency, bandwidth, and throughput in the context of a stretch of highway. Latency corresponds to the time it takes you to get to your intended destination. In contrast, bandwidth corresponds to the number of lanes available to the drivers, and throughput corresponds to the number of cars that have reached their destinations in a given time period.
When is scalability desirable?
Not every system, network, or process needs to be scalable. Many systems run perfectly fine without being configured for scalability because the capacity of their workload is limited.
However, for online businesses and enterprise software, prioritizing scalability is essential. Time is money, and businesses have spent hundreds of millions of dollars to shave off just milliseconds of latency.
People hate lag, so latency often plays a significant role in determining whether or not an online business wins or loses a customer. As enterprises grow, the demands of their users will as well. For an enterprise to keep up with its growth rate, it needs to ensure that its systems are scalable.
A system with poor scalability can increase labor costs and response times and even compromise the quality of a product or service.
Note: Ask yourself how many users and data you want a design solution to support based on your growth prediction, ask yourself, "Can this design support
x
-times that number of users and data, in the nexty
-years?
Mechanisms to achieve scalability
There are two ways to scale an application: vertically and horizontally.
Vertical scaling (scale up)
Vertical scaling means adding more resources to the existing nodes of a system to handle increased loads. Here's an architectural analogy for you to consider.
You're the developer of a high-rise apartment building, and you want to accommodate as many tenants as possible. To increase occupancy limits, you would need to build additional floors.
The same rule holds if a server with 16 gigs of RAM hosts an application that sees an explosion in traffic. One way to accommodate this extra traffic would be to increase the RAM to 32 gigs.
Vertical scaling is often the simplest solution as it doesn't require any complex code extensions or configurations. With vertical scaling, you are just increasing the available computational power, RAM, and bandwidth of the existing nodes in your system.
However, vertical scaling has constraints that hinder its indefinite use. Let's go back to our apartment analogy. Your high-rise apartment building can comfortably accommodate 10,000 tenants. You are asked to accommodate 1,000,000 tenants.
It becomes clear that you can't just keep adding floors to the same building if you want to create housing for 1,000,000 tenants. There are physical and financial limitations that prevent you from vertically scaling forever. Eventually, you're going to need to build more buildings.
Similarly, you can't augment the processing power, RAM, or bandwidth of a single server indefinitely. The cost of vertically scaling a system increases exponentially and quickly becomes cost-prohibitive.
At some point, you'll need to wrangle more servers into your system to handle all the traffic you'll be seeing.
Horizontal scaling (scale out)
"In pioneer days, they used oxen for heavy pulling, and when one ox couldn't budge a log, they didn't try to grow a larger ox. We shouldn't be trying for bigger computers, but for more systems of computers."
Grace Hopper
When you scale a system horizontally, you are adding more nodes (such as servers or routers) of comparable hardware to increase the available resources of a system.
There are significantly fewer physical and financial limitations to horizontal scaling. It is much easier and cost-effective to add more servers and/or set up additional data centers than make one server extremely powerful. So, the only practical limitation with horizontal scaling is how many resources you have.
However, there are two key pitfalls of horizontal scaling.
- When a large number of community servers are being used at a given time, many may fail to respond (due to several reasons). Therefore, the system itself must be fault-tolerant and functional as long as a majority of servers are still responding.
- A complex middleware to manage and seamlessly distribute user requests to servers must be developed or acquired.
Horizontal scaling also allows for dynamic scaling to efficiently handle changes in traffic. When traffic climbs, more servers go online, and fewer servers go online when traffic drops to conserve resources.
Dynamic scaling is arguably the most prominent reason why cloud computing is so popular. Cloud computing provides horizontal scaling advantages without requiring companies to go through the hassle of building data centers.
Dimensions of scalability
There are several ways to measure scalability, but all of the metrics will measure the degree of satisfaction for accommodating some type of new service.
Size scalability refers to the quality of a distributed system to maintain or improve performance regardless of how many resources it has. If you're horizontally scaling a size scalable system, adding more nodes to accommodate a sudden uptick in traffic shouldn't slow down or disrupt system performance.
Administrative scalability is a dimension of scalability concerned with the manual labor required to manage and maintain a system. An administratively scalable system won't need a considerable increase in administrative overhead when adding new nodes.
Geographical scalability is a dimension of scalability concerned with the impact of physical distance on a system's performance. The distance between two nodes can significantly impact the amount of time it takes for them to communicate.
Load scalability refers to the ability of a distributed system to accommodate heavier and lighter loads flexibly. The definition of load scalability also includes the ability of a system to readily modify, integrate, or remove components to accommodate changing loads.
Functional scalability refers to the ability of a system to add new features and functions without disrupting operations or slowing down performance.
Identifying performance bottlenecks
Performance bottlenecks can compromise the scalability of an application, so it's good to be aware of common reasons why bottlenecks occur.
Monolithic database
Your application has high latency. The application itself is well architected, and multiple nodes handle its workload. Your application can scale horizontally, and everything looks fine.
What could be the source of the lag? You could be dealing with a monolithic database, where one database is responsible for handling all of the data requests for every server node. There are only so many requests a single database can respond to at a time.
One way to make sure your database is scalable is to make wise use of database partitioning and sharding.
Database partitioning splits large databases into smaller ones known as shards to make data retrieval faster.
For example, let's say you have a database of 500,000 employees spanning the entire globe. Going through each of those entries every time a request is made would be a massive waste of time.
You could split that database up into many smaller databases organized by continent, country, company branch, or other ways.
Wrong type of database
Picking a suitable database for your application can significantly impact latency. If you're going to perform many transactions and need strong consistency, then going with a relational database would be the best choice.
In this context, strong consistency refers to the data having consistent values across all server nodes at any point in time.
If strong consistency is not a priority for your application, but horizontal scalability is, then a NoSQL database would be a good choice for your database. Knowing what type of data storage works best for your business needs and considering the tradeoffs of one technology over another early on is an easy way to avoid this particular bottleneck.
Architectural mistakes
A poorly designed software architecture isn't always noticeable at first, but its shortcomings often become glaringly obvious over time. Choosing the wrong software architecture can make adding new features difficult and maintenance costly, and it may shorten a software system's lifespan considerably.
One example of an architectural mistake is scheduling processes to occur sequentially instead of asynchronously. If you're dealing with multiple concurrent requests without any dependencies, then asynchronous processes and modules are one way to shorten the time spent completing those requests.
Start considering system scalability as early as possible, and create performance models to spot potential bottlenecks before you start building.
Lack of caching
Ineffective caching in an application can also lead to scalability issues. Caching is vital to the performance of any application, ensuring low latency and high throughput. Caches intercept all database requests before they hit the origin servers. Intercepting these requests allows the database to free up resources to work on other requests.
Origin servers contain the original version of your website, which is cached by servers at edge locations. CDNs can cache and compress exact copies of your web pages to decrease round-trip time (RTT) and latency.
If you're working with a system with a significant amount of static data, caching any frequently accessed data from the database and storing it in RAM can speed up response times and considerably bring down deployment costs.
Note: There are very few cases where caching doesn't help. However, you should still try to be strategic about caching to avoid data inconsistencies. Look for patterns, and focus on caching frequently accessed content first.
No load balancers
Load balancers use several algorithms to distribute heavy traffic loads across multiple servers evenly.
Here are some algorithms used for load balancing:
- Least Response Time: This algorithm directs traffic to the server with the fewest active connections and the shortest latency.
- Round Robin: This algorithm rotates servers through a queue. Traffic is distributed through the queue based on which server is next in line.
- IP Hash: The IP address of the client determines which server receives their requests.
Load balancing mitigates the problems you might experience with a monolithic database, where all of the traffic converges on a single node. When a server goes down, a load balancer will automatically reroute requests to other server nodes in the cluster to ensure that the entire system stays online.
Note: Software load balancers come with additional features like predictive analytics for pinpointing potential bottlenecks.
Poorly written code
Inefficient or poorly written code is another way to hurt the scalability of your application, and slow down software development in general. Using unnecessary or nested loops, and writing tightly coupled code can bring down the entire service in production.
Tight coupling refers to components that rely heavily on each other. You want to avoid situations where you have business logic operating in the database, for example. Tightly coupled application components can make testing and code refactoring needlessly complicated.
Loose coupling can deliver greater flexibility to your codebase and make scaling a lot easier.
Not using Big O notation can also hurt your application's scalability. Big O measures the complexity of an algorithm across two dimensions: time (how long a function takes to execute) and space (how much memory a function needs to execute).
Testing the performance of your application
Here are a few tips you can use to optimize the performance of your application to handle more traffic with fewer resources. Optimizing for performance is key to ensuring high scalability because if an application is not performant, it will not scale well.
Profile your application. Profiling is the dynamic analysis of your code and can help you figure out how to eliminate concurrency errors and memory errors. Profiling also improves the overall robustness and safety of your program.
- Run application profile and code profiler, or check out this list of performance analysis tools used in the industry.
- Use a CDN. CDNs reduce latency caused by the proximity of users from the database.
- Compress data. Use compression algorithms to compress data and then store it in its compressed form. Compressed data consumes less bandwidth and downloads much faster than uncompressed data.
Testing the scalability of your application
Now that you've tested your application for performance, you want to start thinking about capacity and provisioning the right amount of computational and storage power. There are many different approaches to test your application for scalability, and the approach you choose will depend on the overall design of your system.
Testing is common at both the hardware and software levels. You will need to test different services and components individually and collectively to identify performance bottlenecks in your system.
Parameters to consider:
- CPU usage
- Network bandwidth consumption
- Throughput
- Number of requests processed within x amount of time
- Latency
- Memory usage of the application
- UX during heavy traffic
In the testing phase, stress and load tests are used to simulate traffic and to see how a system behaves and scales under heavy loads.
Wrapping up and next steps
Wow! We covered a lot today. System design covers many topics, but hopefully, you are now more familiar with some fundamental concepts behind scalability, and why scalability is so critical to good software design. Although technological innovations (like cloud computing) might change the way we solve system design-related problems, it is a near certainty that we will always need to think about how our systems can be designed to adapt and persist well into the future.
If you enjoyed learning about the different dimensions of scalability, performance bottlenecks, and how these things affect the user experience, then a career in system design might be right for you.
To get started with learning these concepts and more, check out Educative's Scalability & System Design for Developers learning path.
Happy learning!
Continue reading about system design on Educative
- Grokking Modern System Design for Software Engineers and Managers
- How to Design a Web Application: Software Architecture 101
- System design primer: Learn the basics of system design
- System design fundamentals: What is the CAP theorem?
Start a discussion
What is your favorite system design concept? Was this article helpful? Let us know in the comments below!