Speeding Ahead with a Systematic Approach to Web Performance | by Mihir Mathur | Apr, 2021


From replaying shared rides on a map, to solving physical safety problems in real-time, to managing a fleet of thousands of bikes and scooters, to viewing the trajectories of autonomous cars—frontend services at Lyft support a plethora of diverse use cases.

As Lyft has grown over the past decade, so has the complexity of our business needs. However, the one core requirement for fulfilling these diverse needs is the necessity of high-performance web applications.

We sat down with our senior and staff engineers who are well versed with the history of frontend engineering at Lyft to understand how we have navigated performance challenges and built an ecosystem to support a myriad of use-cases needing performant web applications. By looking back at our journey of building 100+ high-performance frontend microservices at Lyft, we’ve distilled our learnings into what we call the Hierarchy of Web Performance Needs — a system that can strategically identify the most impactful performance needs of an organization building web applications.

Hierarchy of Web Performance Needs

In this post we’ll describe this framework and give a glimpse of our ever-evolving frontend performance stack. This framework could help engineers who know the best practices for web development but have grappled with questions such as:

  • How do I quantify and measure the business impact and prioritize performance improvements over feature development?
  • What kind of performance tools/techniques do I use or build first?
  • How do I influence the culture of an entire organization to value performance?

The popular adage,“If you can’t measure it, you can’t improve it”, holds true for web performance.

About 6 years ago, one of the first web performance investments made at Lyft was capturing, viewing, and analyzing RUM performance data. The way it worked was by running a script in each web app that would use the (now deprecated) Performance Timing API to record timing of events such as requestStart, domLoaded, and domInteractive. This data was sent asynchronously to a server side analytics tracking endpoint using Navigator.sendBeacon() on page load and then stored in our data warehouses for analysis.

Last year, we started recording the new standard of web performance metrics: Google’s Core Web Vitals. To easily record these metrics across our array of applications, we created a wrapper around the web-vitals library that sends metrics to our analytics endpoint and can be used like this:

import WebVitalsTracker from '@lyft/web-vitals-tracker';

export class App extends React.Component<AppProps> {
render() {
return (
<div>
<WebVitalsTracker sendingService='myservice' />
</div>
);
}
);

Once we collected all this data, we needed tools to examine it and gain meaningful insights. We use two third party tools for this purpose:

Core Web Vitals metrics on a Mode report for a service
  • Grafana: For real-time observability of metrics and for integrating with alerts.
  • Mode: For advanced analysis of trends of data (stored in our Hive clusters) using custom Presto queries.

Equipped with tools to capture and analyze performance data, one might know some actionable steps that could improve a metric. But simply knowing action items for improving performance metrics in an organization with lots of feature work on the roadmap is often not enough. How does one get buy-in from their team and leadership for dedicating time and engineering resources to improve system performance?

One approach Lyft takes is joining the performance data with key business metrics. Articulating a performance hypothesis in the format: “increasing <perf_metric> by X% would increase/decrease <business_metric> by ~Y%” to stakeholders can help in prioritizing performance work.

However, forming such hypotheses is easier said than done. First, engineers should be equipped with data: both for performance and the business. Second, the causal relationship between a performance metric and a business metric may not always be clear. The first problem can be solved by having a culture of data transparency. One tool that can help in democratizing data access to all internal stakeholders is Lyft’s open-source data discovery engine Amundsen.

For the second problem, it may help to think about how users will be affected if some performance metric was to significantly improve. Can they do more in less time? Are they more likely to revisit the site? Will they recommend the app to more people?

Some of the most impactful performance work that can be done, especially for consumer web products, is making web pages load as quickly as possible. A study recently found that slow-loading web pages caused a significant increase in users’ blood pressures and led to stress. Moreover, forming a performance hypothesis that ties a page load metric (eg., Largest Contentful Paint, Speed Index, Time to Interactive) to business value (eg., sales $, sign ups, cost savings) can be relatively straightforward since there are several examples off of which hypotheses can be based.

For instance, in 2015 when we had an Angular 1.3 frontend, a project was undertaken to make Lyft’s driver sign up page load faster. The hypothesis was that a reduction in page load time would lead to increased driver sign ups. A series of improvements such as the introduction of webpack for bundle splitting, moving to React for faster rendering (among other reasons), enabling server-side rendering, and excess CSS removal were introduced. These improvements led to a reduction in page load time from over 2 seconds to a few hundred milliseconds, which led to a 9% uptick in driver sign ups at the end of the funnel.

ride.lyft.com: One of Lyft’s user-facing web applications

At Lyft, we do several load-time optimizations such as: Server Side Rendering, code-splitting our apps, Brotli compression for serving static files, pre-fetching content, dynamically loading expensive Javascript, using fall-back web fonts, setting up libraries to tree-shake, among many others. Most of these optimizations happen at the build system level.

Despite all the optimizations that each frontend service inherits, we sometimes mess up. For example, we’ve had cases where duplicate libraries get packaged into different JS bundles of the same app. Thanks to our tooling, we are able to keep an eye on regressions. For instance, we record bundle sizes during each build, and use webpack-bundle-analyzer to periodically audit the bundles of our services and shave off as many kilobytes of Javascript as possible.

While load time optimizations are often easy to relate to business metrics, the hypothesized impact might not always materialize. For example, hoping to improve conversion rates of one of our user facing pages, we A/B tested a server-side fix that lowered Time to First Byte (TTFB) by 100ms at p50 and over 3 seconds at p95. Conversion rates on that page barely budged. Even though we shipped that change, we learned that load-time optimizations may not always move the business needle.

With some of the optimizations mentioned above, a website would feel snappy during load. But to truly delight users, every possible interaction on an application should feel instantaneous (i.e., there should be visible responses to each input within 100ms). We strive to create such experiences.

However, having buttery smooth and fast interactions is difficult if an application is data or compute intensive. One such example is the app used by our support agents for resolving problems for riders and drivers. This heavyweight application provides an interface for real-time chat and phone communication coupled with a CRM for quickly sifting through multiple users’ rides, payments, or support history concurrently. Another example of a client-side compute intensive frontend service is our internal application for operations teams to manage fleets of bikes and scooters. Users need to quickly zoom in and out while viewing a lot of information overlaid on a map.

Our internal app for managing fleets of bikes and scooters

These services, among many others, are used by each internal user for several hours a day. Therefore, enabling fast interactions is vital to their productivity. A few things we do to improve run-time performance:

  • Tighten the long-tail of API call latencies: By examining latency data for each API call, the slowest requests to the server can be prioritized for improvement.
  • Batch Requests: The number of egress requests can be significantly reduced by batching. For example, metrics and logs can be sent to an analytics endpoint in batches instead of individually.
  • Smart Data Fetching: Use GraphQL or paginated APIs for fetching data and prioritize data fetching for above-the-fold components.
  • Memoizing: Store frequently rendering components in memory using React.memo() and the useMemo hook.
  • Profiling components/interactions: Dissect the rendering, scripting, and painting time of complex components or interactions to figure out the best optimizations.

React profiler and Chrome DevTools are great tools for run-time performance profiling. These profilers, along with a solid foundation of measuring and monitoring tools that let users record any run-time metrics and visualize over time, can help to pinpoint the slowest parts of an application and make prioritization easier.

Once web pages load blazingly fast and every interaction completes with lightning speed — what should be the next focus? The next step is to ensure that every new application built by the company inherits instrumentation for performance monitoring, has fast load speeds, and has great run-time performance––all with minimal effort from engineers.

This is at the top of the pyramid because one can condense the learnings from all the solved performance problems and extract them into the build system or into reusable primitives. A new engineer with very little experience could then automagically write high performance frontend code using the primitives.

Furthermore, in an ideal performant web infrastructure, it should be very hard to merge non-performant code. This can be achieved through a combination of abstractions, processes, and education. Some of the building blocks of our performant infrastructure and culture are:

  • lyft-node-service: Our NextJS-based infrastructure makes it easy to spin up a new frontend service with most of the performance optimizations built in.
  • Lyft Product Language Design System: Our design system and the accompanying library of unified high-performance, accessible UI components that provide building blocks for performant applications.
  • Plugins and Migrations: We’ve built an internal plugin system that lets engineers easily share functionality with other frontend services across Lyft, so a performance optimization implemented by one team (e.g. image compression) could easily be distributed to every other service. Migrations are jscodeshift scripts that can apply changes or upgrades to all our frontend services. For instance, a performance change that needs a library to be updated or some code edits, can be applied in an automated way to all services so that from engineers’ perspective, the change is not breaking. This talk by Andrew Hao explains Plugins and Migrations in more detail.
  • Frontend Performance Force: A working group of engineers from different teams who want to level up Lyft’s web performance. This group identifies new performance areas to focus on, shares learnings, creates educational resources, and strives to build a culture that prioritizes performance at Lyft.

The hierarchy of web performance needs presented here is a heuristic for prioritization based on our learnings at Lyft. A question one might ask is: Could the pyramid be inverted? I.e., prioritize building a shared performant infrastructure and tooling first, then load-time and run-time optimizations, followed by measurement and monitoring?

One of the problems with not investing in measurement tools first is that prioritization would not be rooted in data, and it would be harder to justify business value. Moreover, it can be challenging in engineering resource-constrained organizations with lots of feature work on the roadmap to embed performance features in the shared infrastructure (if there is one).

Further, one of the benefits of prioritizing individual run-time and load-time optimizations over performant infrastructure work is that an organization can learn which optimizations are important enough to be extracted into the infra layer. However, one caveat in the hierarchy we presented is that if a web application has a captive audience (eg. internal tools, enterprise software), then run-time optimizations can be prioritized higher than load-time optimizations.



Source link