The issues encountered are varied, but some common patterns have emerged:
Logging is very common in services, and is expected to be cheap. However, older logging frameworks, synchronized loggers, short-lived logger objects, and function evaluations during logging can all have performance effects. Upgrading from Apache’s log4j1 library to log4j2 with asynchronous logging can result in dramatically better performance—even a 4x throughput improvement in some cases.
java.security.SecureRandom methods are potentially very slow due to blocking for entropy generation, and may also be synchronized—a double whammy. It may be surprising, but this is used in java.util.UUID.randomUUID.
There are two ways to tackle this (there’s a slight synergistic effect when doing both):
Upgrading to a recent JRE will remove synchronization in SecureRandom.nextBytes.
Change the underlying logic that generates entropy, or use a different default entropy source. This is a complex discussion; this Synopsys blog entry has some of the details.
For one service, this resulted in a roughly 40% throughput improvement.
JVM Exceptions can be slow (orders of magnitude slower than non-exceptional cases, on occasion), but in other cases, the effect is negligible. This blog post has an enlightening discussion on the issue.
Generally the best fix is to not throw the Exception; however, when that is not possible, there are workarounds, such as caching the Exception or reducing its stack trace.
One service at LinkedIn improved its throughput by 35% simply by not raising NumberFormatExceptions during String parsing. A similar optimization is available in the Google Guava tryParse methods.
ForkJoinPool is a concurrency framework for parallel processing. It seems that CPU spinning from java.util.concurrent.ForkJoinPool.awaitWork experienced a performance regression, per JDK-8080623. One service at LinkedIn experienced a 25% throughput improvement by refactoring, but alternatively, a JRE upgrade should resolve this as well.
Other classes affected by contention
We have additional rules to help improve multithreaded performance: prefer java.lang.StringBuilder to synchronized java.lang.StringBuffer, prefer java.util.concurrent.ThreadLocalRandom to java.util.Random, prefer unsynchronized Maps to java.util.Hashtable, and others. Some of these are subtle concerns, but in multithreaded services with contention, they can affect 99th percentile latency, for example.
We’ve created additional rules, including some for Java Reflection (sometimes slow: cache results when possible) and regular expression matches (hand-parsed routines can be faster). These are straightforward and generally minor concerns and so have higher thresholds before they are flagged, but they can contribute to slow methods if called often.
In this post, we’ve given an overview of our “common issue detection” feature for CPU profiling, and have described some of the improvements we’ve experienced. We expect even more improvements as additional issue patterns are found and added. We hope that you see the benefit of such a feature, and can apply something similar for your own systems.
The development and use of this feature at LinkedIn has been a significant cross-team effort. We wish to thank Brandon Duncan, Josh Hartman, Jason Johnson, Todd Palino, Chris Gomes, Yi Feng, their respective teams, and of course, all the users of the framework.