A few month ago one of my customer complained about significant performance problems in one of my recent projects.
The response times of their appointment booking services were far beyond agreed KPIs. During peak conditions, these services showed up to 30 seconds response times. The transaction volume on this service was high, and it was used by up to 15.000 users per day.
Assessment of this performance problem
I initially thought of running a load test on this service and reproducing the problem in their testing stage. After discussing the architecture and test data requirements with their appointment booking solution architects, I learned that the test data requirements were complex. We had to set up several thousands of calendars in Exchange, and their testing stage was much smaller. Given the time criticality and severity of their issues, we decided to turn to an analysis on their production environment as our first step.
Performance Analysis
Kubernetes is often the platform of choice when large enterprises make their IT services accessible to their customers, and there are good reasons for that. Someone can quickly achieve scaling, health checks, and high availability on Kubernetes. The Kubernetes layer can easily be scaled up or down based on the demand to prevent overprovisioning.
The appointment booking service used a CPU limit=200 ms and a Memory limit=1GB.
In our customer environment, we used Dynatrace to trace the Kubernetes platform and all its services. Dynatrace was installed in the Full-Stack monitoring mode. There were the Dynatrace Operator, their Dynatrace ActiveGates, and the OneAgent pods running. The OneAgents collected code-level insights for the appointment booking services, and an initial check of these metrics pointed out
No method-level hotspot
No design-related issue
Fast database queries
Low CPU usage in the cluster
Low Memory usage in the cluster
In Kubernetes, there is the CPU throttled metric that every performance engineer must include in their analysis. When you use CPU limits, the Kubernetes layer ensures that the container or services can't use CPU resources beyond this limit. Each container gets a fair share of CPU resources, but if a container is consistently throttled, response times might be slow.
Root Cause and how we fixed it
For our appointment booking service, CPU throttled was very high, and this was our indicator that the limit configuration might be the root cause of our response time issues.
We changed CPU limit=1000ms and CPU request=500ms.
After applying these changes, the response times of this appointment booking service returned to their expected 3 seconds KPI and CPU throttling disappeared.
Lessons learned
Review the architecture to learn more about design and testing considerations.
Understand the current configuration before you start the performance analysis.
Collect metrics for the entire stack to make the correct conclusions.
CPU throttling can protect your Kubernetes cluster but also slow down the response times of your services.
CPU request is your application's minimum CPU capacity, and for testing response times under worst-case scenarios, you can set limit=request.
Keep up the great work! Happy Performance Engineering!
Why not just disable the limits?