We might be seeing the end of the tunnel with our performance woes at work. I did some profiling with pprof this morning, and saw that a large amount of time was spent in context.Value()
. Which is strange, given that this is just a way of retreving values being carried alongside context instances.
My initial suspicion was that tracing may have been involved. The tracing library we’re using carries spans — like method calls — in the context. These spans eventually get offloaded to a a service like Jaeger for us to browse.
We never got tracing working for this service, so I suspect all these spans were building up somewhere. The service wasn’t memory starved, but maybe the library was adding more and more values to the context — which acts like a linked list — and the service was just spending time traversing the list, looking for a values.
This is just speculation, and warrents further investigation, maybe (might be easier just to spend that effort getting tracing). But after we turned off tracing, we no longer saw the CPU rise to 100%. When we applied load, the CPU remained pretty constant at around 7%.
So it’s a pretty large signal that tracing not being offloaded is somewhat involved.