How To Avoid Response Time Alerts Triggered by Slow Clients | RudderStack

neub9
By neub9
2 Min Read

Monitoring Website Performance

As a part of our core streaming product, RudderStack receives HTTP requests from a diverse client base. Various SDKs enable devices to send events through RudderStack from users all over the world, via different networks. Our engineering team closely monitors the response latencies to ensure our SLAs are met and to discover anomalies within our system and clients. In our Go code, we use a middleware in our router to measure the request latency. We observe the 95 and 99 percentile of response times using statsd and InfluxDB to collect and store them. Alerts using Kapacitor notify us when latency is high.

These request latency alerts have been beneficial for detecting database issues, slow reads, and rights. However, as our customer base grew, we began receiving high response latency alerts from a small subset of customers. On-call engineers spent a significant amount of time checking related graphs for multiple alerts daily, only to find that there were no other metrics indicating a problem on our end. This led to the alert being classified as noise and not actionable.

Upon further investigation, we discovered that the latency alerts were caused by network issues that were outside our control, not a RudderStack issue. We tested different avenues to determine the root cause and found that requests from slow clients were affecting our metrics, resulting in noisy alerts. To address this, we implemented a body buffer middleware in our code to ensure measurement only happened after the whole body was transferred from the client. This solution helped us mitigate the impact of slow clients on our metrics.

While the alerts were useful initially, we needed to adjust our approach to account for the expansion of our user base and the challenges that came with it. By implementing the body buffer middleware, we were able to address the issue without needing to remove the monitoring middleware, which saved us from unnecessary complexity and extra work.

Share This Article
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *