Benchmarking CPU utilization with various tools
For the initial version of Query Analyzer, we have used MySQL to handle the metrics (basically a time-series data). There are two tables: query_history and query_info. The query_history is where we save information from the query hashmap. It has these columns: hostname, checksum, timestamp, count, query time, user, and db. The primary key is (hostname, checksum, timestamp), and the range partitions are by timestamp, with subpartitioning by key on hostname. It has indexes on (hostname, timestamp, querytime, count) and checksum.
The query_info table is used to save the information about the query metadata. It has these columns: hostname, checksum, fingerprint, sample, first_seen, mintime, mintime_at, maxtime, maxtime_at, is_reviewed, reviewed_by, reviewed_on, comments. It has (hostname, checksum) as the primary key and index on checksum.
So far, we have not faced any issues with using MySQL for metrics. But sometimes, when plotting the query trend graph for long time range, we observed some latency. To overcome that, we are planning to emit the metrics from MySQL to our internal monitoring tool, called inGraphs.
Agent needs to be run under sudo. To alleviate potential security issues, you can give elevated permissions “cap_net_raw” to the agent. Further, by setting the execute permission only to a particular user (chmod 100 or 500), you can restrict the run the agent under a specific user without sudo. Refer to https://linux.die.net/man/7/capabilities for more details.
The benefits of Query Analyzer have been numerous. These include allowing our database engineers to identify problematic queries at a single glance, to compare a week-over-week overlay of query activity, and to troubleshoot database slowdowns quickly and efficiently. Developers and business analysts are able to visualize query trends, check the query load in a staging environment before entering development, and obtain metrics per table and database for things like number of inserts, updates, deletes, etc., through which they can analyze the business. From a security standpoint, Query Analyzer allows us to receive an alert whenever a new query hits the database, and we can also audit the queries that are accessing sensitive information. Lastly, analyzing the query load allows us to ensure that queries are distributed evenly across servers, and thereby optimize our hardware. We can also conduct capacity planning more accurately.
While a timeline has not yet been defined, we have plans to eventually open source Query Analyzer and hope that it will be useful for everyone else.
I would like to thank the MySQL team at LinkedIn: Basavaiah Thambara and Alex Lurthu for design review, Kishore Govindaluri for developing the UI, and Naresh Kumar Vudutha for code review. Finally, this project would not have been possible without support from LinkedIn’s data leadership; thanks for all of your support.