We are the IntelliJ IDE team at Google. We develop IntelliJ code (plugins and patches) to provide a seamless integration with Google’s unique build system, and make IntelliJ scale to Google’s huge and rapidly changing codebase performance-wise.
We recently investigated whether using Solid State Devices (SSDs) could improve indexing performance for large projects. We decided to share our analysis and results here for the benefit of our Engineering Tools blog readers.
First, what do we mean by indexing? IntelliJ, like any IDE, maintains a datastore of source code meta-information that helps autocomplete, hyperlinking in source code, refactoring and a plethora of other features that operate on code. The process of creating this datastore is called ‘indexing’.
But why do we want to speed up indexing? Indexing occurs whenever a new IntelliJ project is created or source files change outside of IntelliJ (like a source control sync). As you can imagine, these are frequent operations. Now considering the scale and size of projects in a monolithic codebase like Google’s, the changed delta can be large and so it can take a significant amount of time to index. This operation effectively blocks users from using many rich features the IDE has to offer. We want to minimize this block time.
Why SSDs? Being aware that indexing is I/O intensive, we thought that using faster storage devices could be a silver bullet to improve indexing performance. But when we did some experiments, we learned otherwise.
SSDs use solid state memory to store data and have no electromechanical parts like a Hard Disk Drive (HDD). They offer lower latencies for data transfer in general. For random accesses read operations SSDs perform 50x-200x times better than typical HDDs, and they perform better than HDDs for sequential reads as well. Though SSD writes are not as fast as reads, write operations on modern SSDs are still faster than typical HDDs.
Now doing these experiments is one thing, but how to measure results? It would be ideal to determine the total mix of read and write operations performed by IntelliJ process during indexing to compute effective performance benefits. But while there are tools like blktrace that offer process level drill down of IO operations, it’s not easy for them to track internal file system level optimizations made before disk writes. For example, many write operations will actually be buffered and executed together by pdflush, but IO monitoring tools would actually count those as pdflush process writes only.
Long story short, we think the most reliable way to measure indexing improvements for now is to actually trigger a build/indexing for sample projects with and without SSDs and measure each case.
Now on to the fun stuff. Here are some details of the machine we used for sampling:
CPU: Intel quad core 2.4GHz
SSD: Kingston 128GB
HDD: WDC 500GB 7200RPM SATA-II
RAM: 8 GB DDR2, 800MHz (1.2ns)
JVM – 64-bit OpenJDK
IntelliJ Version – 10.5
File system: ext3
For the experiments, we chose sample projects that represented the set of projects we’re trying to optimize indexing performance for. We ran these experiments using SSDs in one case and HDDs in the other, keeping everything else constant. These are the results averaged out for 10 runs for one such sample project:
Scanning Files to index Time
Table 1: Index times of IntelliJ community edition code using SSD, normal hard-disk
From our experiments we did not find any significant improvements in indexing time when SSDs were used. While SSDs did offer 25% faster performance in the first run, which is when the source files of interest are not already cached by the file system (like after a reboot), on subsequent indexing operations (the practical case) this advantage reduces sharply. So the cost-benefit of this silver bullet does not really come through for us.
However this doesn’t mean SSDs can not possibly make things better. We feel there could be greater indexing performance boosts using SSDs if:
- The underlying filesystem is highly optimized for SSDs (like ZFS for Solaris)
- Number of disk read operations performed is more than number of write operations.
But some of these requirements are not trivial for us to implement. Till then, we will explore other ideas to improve IntelliJ performance and post any learnings that we feel would benefit our readers and the community. Stay tuned.
– Chandra Sekhar Pydi, Siddharth Priya, Abhishek Sheopory