Predictive scaling

Predictive scaling is the best-case approach that any organization wants to take. Often, you can collect historical data of application workload, for example, an e-commerce website such as Amazon may have a sudden traffic spike, and you need predictive scaling to avoid any latency issues. Traffic patterns may include the following:

  • Weekends have three times more traffic than a weekday.
  • Daytime has five times more traffic than at night.
  • Shopping seasons, such as Thanksgiving or Boxing Day, have 20 times more traffic than regular days.
  • Overall, the holiday season in November and December has 8 to 10 times more traffic than during other months.

You may have collected the previous data based on monitoring tools that are in place to intercept the user's traffic, and based on this, you can make a prediction for scaling. Scaling may include planning to add more servers when workload increases, or to add additional caching. This example of an e-commerce workload is one that tends toward higher complexity and provides lots of data points to help us to understand overall design issues. For such complex workloads, predictive scaling becomes more relevant.

Predictive auto-scaling is becoming very popular, where historical data and trends can be fed to prediction algorithms, and you can predict in advance how much workload is expected at a given time. Using this expected data, you can set up the configuration to scale your application.

To better understand predictive auto-scaling, look at the following metrics dashboard from the AWS predictive auto-scaling feature. This graph has captured historical CPU utilization data of the server, and based on that, has provided the forecasted CPU utilization:

Predictive scaling forecast

In the following screenshot, an algorithm is suggesting how much minimum capacity you should plan in order to handle the traffic, based on the forecast:

Predictive scaling capacity plan

You can see that there is a variation in the minimum capacity at different times of the day. Predictive scaling helps you to best optimize your workload based on predictions. Predictive auto-scaling helps to reduce latency and avoid an outage, as adding new resources may take some time. If there is a delay in adding additional resources to handle website traffic spikes, it may cause a request flood and false high traffic, as users tend to send a repeated request when they encounter slowness or outages.

In this section, you learned about predictive auto-scaling, but sometimes, due to a sudden spike in the workload, you need reactive scaling. We will learn about this in the next section.