Blogs

Handling the Holiday Season scale

Written by David | Feb 6, 2023 4:24:52 PM

You either love it, or you hate it, but the period between Black Friday and Christmas will forever be the busiest time of the year for the e-commerce world. For our partners and their e-commerce managers, this often involves a huge rush to get tooling, such as Faslet's Size Me Up virtual assistant, up and running. For providers, such as ourselves, this normally means a huge spike in traffic. How do we handle doubling our traffic within a week, without any downtime or huge costs? Let’s break it down below. 

Observability

The most important thing when running a complex system is to make sure that your team is aware when something is wrong. It sounds simple, but too many tech companies don’t have automated error notifications that provide their team the observability required to respond quickly too errors. At Faslet, we receive slack notifications for every system error and automated high priority emails whenever the system isn’t responding. This ensures that we know the status of our system at any given moment.

Risk minimization

Although we trust our continuous deployment process and automated testing to make sure everything works, bugs happen, and things can randomly fail. We’re normally more than happy to release to production on a Friday evening, but in times of huge load such as the week running up to Black Friday, we tend to avoid releases in busier times. We’ll often finish a piece of work, and then release it a bit earlier the next morning when traffic is lower. This means on the off chance that something does go wrong, fewer end-users are impacted, and we can also enjoy our evenings. Work-life balance is very important!

Backend services

Now for the nitty, gritty, tech-stuff. Since day one, Faslet has used ECS, which is Amazon’s docker container service. This allows us to easily run our system in the cloud. Over time, we’ve split a single monolithic backend in ECS into AWS Lambdas (Cloud functions), with ECS acting as a gateway. Since Lambdas scale automatically, this is the right choice when you have a sharp increase in traffic. ECS, however, requires a bit more work. Right before Black Friday week, we ramped up the auto-scaling on our ECS cluster. And boy, was it needed! Our traffic doubled in that week from our previous peak. Because of the combination of a scaled up gateway and lambdas handling many calls, we barely noticed. Even on Black Friday itself, our response times were less than 200ms, and we noticed no service degradation at all.

A brief respite

As was always the plan, we’ve moved away from ECS and are now entirely running on Lambdas. We did this the week after Black Friday, to avoid affecting too many end users. Changing something this core to the architecture of your application is basically a guarantee for downtime, but we managed to keep it down to a few minutes early in the morning, minimizing the negative impact. Now with only Lambdas left, we’re ready to scale our traffic ten fold!

Databases

As mentioned before, Faslet uses AWS Aurora Serverless for our primary database provider. The reason for that is that it’s faster, cheaper, more reliable and more scalable than something a small team can manage on their own. As such, we actually didn’t have to do anything. The database didn’t even need to scale to handle the ton of traffic we threw at it.

Conclusion

In summary, scaling is hard, but can be made easier by good practices, and smart technology choices. Make sure your team has good observability and knows the status of their services at any moment. Embrace CI/CD, but don’t take unnecessary risks, no one wants to fix random deployment failures at 6pm on Christmas Eve! Unless it’s a critical fix, wait until the next workday morning/week. Use tech that scales easily. If you can use cloud functions from the start, do it. Serverless databases are underutilized, and save a ton of maintenance overhead.