System Design primer: Learn the basics of System Design

System Design primer: Learn the basics of System Design

Systems Design is the process of defining the architecture, interfaces, and data for a system that satisfies particular requirements. After you have your requirements for your system, the next step is translating them into technical specifications so you can construct your system.

This is where System Design comes in. System Design gives you a technical solution for your requirements. System Design is an iterative process, so you may end up with multiple designs that will meet the system requirements.

System design is a huge topic. Everyone has a different approach to it as there are no step-by-step guidelines. In this article, we will go through the basics to give you an idea of what it is and how it works.

We will cover the following:

What is system design?

As mentioned previously, Systems Design is the process of defining the architecture, interfaces, and data for a system that satisfies particular requirements. System design should satisfy the specific needs of a business or organization through a coherent, well-running system.

After you have your requirements, you examine and transform them into a physical system design that addresses the customers’ needs. The design activity will vary depending on if you go for custom development, commercial solutions, or a combination of both.

Systems Design requires a systematic approach to building and engineering a system. A good system design requires engineers to think about everything in an infrastructure, from the hardware/software, down to the data and how it's stored.

System Design includes the following design methods:

  • Architectural design: describes the views, models, behavior, and infrastructure of a system.
  • Logical design: represents the data flow and inputs/outputs of a system.
  • Physical design: includes how users can add information, how a system represents information to users, and how data is modeled/stored.

Different kinds of systems

There are different methods you can use to satisfy your system’s requirements for scalability, reliability, security, performance, and consistency.

Horizontal Scaling

In horizontal scaling, you add more machines in parallel to deal with the increasing requirements. You will need load balancing to distribute the load across the system. If any machine fails, the requests are redirected to the other machines, and it scales well when your users increase. Data inconsistency is a drawback.

Vertical Scaling

Vertical scaling uses one huge machine that handles all your requests and improves response time and throughput. Though it offers faster network calls, data consistency, and no load balancing, you have a single point of failure and hardware limitations.

Monolith applications

These are single-tiered applications with different components from a single platform. These are good for small teams as they are not complex, have no duplication, and have faster procedure calls. Despite that, they can be difficult to maintain if they get too large or complex.

Microservices

Microservices allow you to develop software systems with single-function modules that have well-defined interfaces and operations. They are highly testable and maintainable, independently deployable. Microservices are more complex and require cultural shifts in organizations adopting them.

Step 1: Requirements clarifications

This is an important step as you need to narrow down to a specific goal so you don’t overcomplicate things. Clarifying your goal helps focus on the main features and remove any ambiguities and identify potential bottlenecks. We can divide our requirements into two parts:

Functional Requirements

Functional requirements are requirements the system has to deliver. These are the main goals of the system. Functional requirements include things like business rules, authentication, administrative functions, authorization levels, etc.

Non-Functional Requirements

Non-functional requirements restrict the system design through different qualities. They need to be analyzed, and if they are not fulfilled, they can harm the business plan or goals. Non-functional requirements include performance, security, reliability, scalability, maintainability, availability, etc. All these different parameters help you analyze and determine if your system is designed properly.

Let’s take Twitter, some functional requirements can include:

  • Users should be able to post new tweets
  • Users should be able to follow other users
  • Users should be able to mark tweets as favorite

The non-functional requirements can include:

  • High availability
  • Consistency
  • A latency of around 200ms for timeline generation

These are some basic requirements that can further be extended to include searching, replying to tweets, tagging users, notifications, trending topics, etc.

Step 2: Estimation of important parts

This step is about the scale of your system. How you measure it will vary depending on your system. You need to keep in mind parameters like the number of queries per second and the data the system will be required to handle.

For Twitter, we will need to keep in mind parameters like storage, bandwidth estimation, total tweet views, etc. Let’s say we have around 200 million daily active users, a hundred million new tweets, and each user follows about 200 people.

Storage

Assuming each tweet has 140 characters, takes two bytes to store a character without compression, and an additional 30 bytes to store metadata, the total storage needed will be around:

100x(280+30)= 30GB/day

Bandwidth Estimate

Ingress:

If every fifth tweet has a picture of 200KB and every tenth a video of 2MB, ingress would be:

(100M/5 photos 200KB) + (100M/10 videos 2MB) = 24TB/day

Egress:

We have 28B tweets a day, we need to show every picture, but assuming the user only sees every third video on their timeline, egress would be:

(28B * 280 bytes) / 86400s of text => 93MB/s

+(28B/5 * 200KB ) / 86400s of photos => 13GB/S

+(28B/10/3 * 2MB ) / 86400s of Videos => 22GB/s

Total =35GB/s

Total tweet-views

Assuming a user visits their timeline twice a day and visits five other pages that have 20 tweets each, the total tweet-view is:

200M DAU ((2 + 5) 20 tweets) => 28B/day

Step 3: Data Flow

This involves the system’s data model and how data will flow between the different components. Choosing a database system is also part of this. You can choose between these three:

1. Relational Databases: Relational databases store data in the form of tables linked together in the form of primary and foreign keys. These are a good choice if:

  • You’re building the first version of your system and aren’t completely sure about the data access patterns
  • You want to maintain zero data redundancy.

2. NoSQL Databases: This is a good option if your data model has no fixed schema.

3. Graph Databases: Graph databases are a good idea when you have many many-to-many relationships.

A possible database schema for Twitter can be as follows:

Databases Twitter

Step 4: High-level Component design

You can’t design an entire system in one go. That’s why we split it into major high-level components and then those into a detailed design based on requirements. In this step, you sketch the main components of your system and how they are connected, don’t go into the details yet.

Based on our estimations for Twitter, we will need a system that can handle all that load while storing efficiently.

High-level component design

Step 5: Detailed design

Now that you have identified your core components, it’s time to dig deeper into them. You want to start by analyzing the different approaches to solving a given problem and the pros/cons of each potential solution.

It's also important to do tradeoff analysis at this stage. Considerations like these are commonly addressed during this step.

  • How much data do we need to cache to speed up the response time?
  • Where should we need to use a load balancer?
  • Do we need to partition data to distribute to multiple databases?

6. Step 6: Identify and resolve bottlenecks

With the detailed design done, the next step is identifying bottlenecks in the system and mitigating them. Bottlenecks can include anything from traffic, data, storage, availability, redundancy, backup, etc.

Some questions to consider at this stage are:

  • Is there a single point of failure in this system? How do we remove it?
  • Do you have enough data replicas to serve the user in case you lose a few servers?
  • Do we have enough copies of our services to prevent a shutdown?

Next steps for System Design

There you have it! A very simplified guide to System Design. Remember to keep your design simple, things will not always go your way, so you may have to come back and make some changes on the go.

The following topics are recommended as a next step for understanding System Design:

If you are interested in exploring this further, check out Educative’s comprehensive course Grokking Modern System Design for Software Engineers & Managers. This course covers all the important concepts in web applications, microservices, and AWS architecture, all with a hands-on coding environment and more.

Happy learning!

Continue reading about system design on Educative

Start a discussion

What System Design specifics do you hope to learn next? Was this article helpful? Let us know in the comments below!