How Slack unified performance metrics across Desktop, iOS, and Android
With the fast pace of life today, people expect information to be available to them at the speed of light, regardless of which device or apps they’re using. Through the power of social media, their experiences — both good and bad — are instantly shared with the world, and many of those tweets, comments, and reviews are about apps being slow. As an app developer looking at this feedback, I ask myself:
- What do they mean by slow?
- Where do they notice the issue in the app?
- Do other users or other devices experience the same?
- Is it performing better or worse for them over time?
To answer all these questions, performance metrics are essential to translate subjective impressions to quantitative measurements that app developers can continuously monitor and improve with each iteration.
In order to reflect the most frequent user actions and their perception of the app’s responsiveness, we picked a set of performance metrics to capture users’ experience, including how long it takes to launch the app, how long it takes to switch to a channel, and the frame rates as they navigate the app. We transformed formerly distinct performance indicators on each client app to a set of unified metrics. We will discuss WHY we need cross-platform metrics, with a deep dive into HOW we implemented the launch time metrics, and WHAT we learned along the journey.
Why do unified cross-platform metrics matter?
Due to the fast-paced nature of Slack iterations, we were tracking performance separately for each client platform. On Desktop, Android, and iOS, there were different metrics vocabularies and definitions. For instance, we measured app launch time on both iOS and Android, but we called it
cold_start_time_authenticated on iOS and
cold_start on Android. The launch times were tracked with various start and end points. Each platform sent distinct types of properties in the metadata. This made it complicated to understand what exactly was included in the performance metrics and made it hard to make apples-to-apples comparisons.
Additionally, each platform was gathering ad-hoc performance logs along the way — some out of sync and not maintained anymore — resulting in multiple metrics tracking the performance of similar processes with slight variance. For data analysts and product managers, it created a maze of metrics that was difficult to navigate, with no guarantee they were looking at the right performance stats. They also had to keep track of the list of exceptions and edge cases on each platform to be aligned with a shared goal.
We need common metrics vocabularies and definitions across platforms for many reasons:
Internally, we want to be able to collect accurate results to help make business and product decisions. They provide us with aligned benchmarks and objectives for measurable goals. We also want them to be the north stars for performance targets that help us understand the impact of product changes on app performance.
Externally, our customers are cross-platform. They start the morning by checking notifications on the phone, come into the office and work on the Desktop app during the day, and finish a couple of messages on mobile apps on their commute home. It’s crucial to provide a consistently good app performance for them despite the devices or platforms they use. We need the metrics to be reliable, so we can show customers the performance status and trend of our apps across devices, to identify the focus area of their expectation, and to demonstrate the progress of continuous improvements.
For the metrics to be clearly defined and easily actionable, we need a clear protocol for the platforms to follow. We defined a shared interface for a batch of cross-platform performance metrics, including launch time, channel switch time, and frame rate stats. We use Thrift for the performance metrics specification, which provides us with a unified and language-neutral interface with strongly typed metadata. For the next iteration, we are exploring options towards using Protocol Buffers, which have better support on Swift. The specification serves as an agreement on the structure and data types of the metrics and their parameters. It makes data aggregation, formatting, and monitoring simple for the data analysts. A single data pipeline not only prevents repeating the work for each platform but also makes it possible to compare the performance data on the Desktop, iOS, and Android side by side. We can easily slice and investigate the stats to generate insights and encourage cross-team collaboration in performance improvements.
In brief, there are three key advantages to having unified cross-platform metrics:
- Meet performance expectation consistently with shared performance goals
- Improve communication internally and externally with common metrics definition and implementation
- Have a single data pipeline for data aggregation and analytics
Now let’s look at a case study of the app launch time and how we benefited from united performance metrics across Desktop, iOS, and Android client apps.
Tackle app launch time tracking
First impressions matter when it comes to both people and apps. It’s essential to bring the most relevant and updated information to the user during each app launch. To ensure this, we needed to accurately track the current launch time for users on various devices.
How do we use cross-platform metrics to help us tackle launch time tracking? Each platform has different launch processes and implementations of network requests, data persistence, and UI rendering during app launch. We needed these metrics to echo the perspective of the users, regardless of their platform, to help us meet their expectations on app launch speed. We identified two metrics in the launch process that apply to all the platforms: Time to Visible (TTV) and Time to Usable (TTU).
Time to Visible
Time to Visible (TTV) is defined as the time from app launch to when locally cached content is displayed. From a Slack user’s point of view, it’s the time taken to show the first meaningful content — usually messages — so they can start from where they left off last time. Similar to the notion of First Meaningful Paint for web browsers, we wanted to capture the user experience of how fast they can start reading actual content in Slack, which captures their perception of the app launch speed.
For TTV, the app client has full control of this portion of app launch time. As cached data is available locally, the time to reach TTV is unaffected by backend or network performance. Even if the user is offline during app launch, the app is still able to render what was loaded before. For our users, they will see the conversation from where they left off. However, they might not be looking at the most recent content yet.
Time to Usable
Time to Usable (TTU) is defined as the time from app launch to the moment when the most recent content has been displayed on the screen. At this point, all content from the server is up to date, and the latest messages are available for the user to make decisions and contribute to the discussion.
The time to reach TTU could be affected by network condition, API delay, response size, etc. However, TTU reflects user experience when they are informed by the most recent messages and can take actions based on what they see, hence the term ‘Usable’. We don’t explicitly check whether the UI is interactive since the user might not choose to scroll or type messages right away. Since the content is up-to-date and all rendered on screen, the interactions are implied.
The example above is recorded on an iPhone 5S device with a slow network connection to demonstrate TTV and TTU. We kick off the request to fetch new content as soon as possible, so the delay between visible and usable is usually shorter. It’s tested on a slower device with high network delay to show visible gaps with UI refresh in each step:
- User taps on the Slack app icon
- The app launches to a loading screen
- Locally cached messages are displayed. TTV is complete
- The app receives new messages from a web request and updates the UI. TTU is complete
Special cases everywhere!
It seems simple to track the launch process from the steps above. However, it’s quite complex with multiple factors involved, and we often run into special cases. We want to capture most of these special cases to be able to diagnose potential tracking issues and detect metrics regressions with accuracy and confidence. The scenarios are identified by client apps and marked explicitly in the metadata. We don’t exclude any launch cases, deferring the decision for the data pipeline to split or filter out stats from each case. Here are some examples of special cases we have logged for TTV/TTU:
No existing cache (TTV)
It’s possible to launch the application to a channel for which there is no cached content (e.g. a fresh install, or opening channels after resetting the cache), in which case a network request is required to populate the conversation. In this case, TTV is the same as TTU since we always need to ping server to download the data.
On mobile, fewer than 1% of sessions have no cached content, but on the Desktop it’s 100% due to no local cache.
No new content (TTU)
It’s possible after pinging the server that the current channel is already in sync, meaning there is no more content to be downloaded. To the end-user, there won’t be an update of the UI after TTV. In this case, we still include the time of the request in TTU, but mark it with a flag.
On mobile, 90% of sessions has no new content to be synced, while all desktop sessions need to pull content from server.
The timing from app launch to TTV/TTU can get interrupted by user actions or the operating system, e.g. the user could decide to switch to another channel, receive a phone call, or lock their phone screen. We flag several categories of user interventions to be able to identify the app launch path.
We want to capture the experience of users waiting for the app to load. If they choose to move away from the message view, the landing channel’s visible and usable time won’t be perceived at launch. These cases need to be marked to be able to filter out or split to avoid introducing noise to the launch time stats.
Ways of launching the app
We track TTV/TTU for both cold and warm launch. A cold launch includes app launch for the first time, after a device reboot, or when the app has been terminated by the OS, meaning the app process does not exist in the system’s kernel buffer cache. A warm launch means that the app resumes from the background or becomes active in the foreground from the existing instance of the app in the memory.
Many actions could trigger the app launch. We tag each launch accordingly:
- On Android and iOS, we tag launches from the app icon, notifications, deep-links, etc.
- On Desktop, we tag launches from the app icon and when the client is loaded in the browser
App or database upgrades
If the launch happens after an app or app database upgrade, higher TTV and TTU may occur due to the clean local cache. We mark this case so it’s not confused with an actual regression on launch time.
All of the special cases above are logged as common properties for all clients. Despite the difference between platforms and implementation, we track them with the same parameters in the metadata, making it easy to spot any outliers by looking at the distribution of these parameters on an individual platform under the same data pipeline. For instance, we’ve noticed about 8% of iOS launches are triggered by a notification, while that amount is around 21% on Android. One possible reason is the notification area on top of the screen of Android indicating unread notifications, while on iOS there is no lingering indicator after a notification goes away. Comparing the range and distribution of the stats helps the app engineers to identify causes of performance issues and track the process of improvements with common data infrastructure.
Challenges and lessons learned
Cross-platform metrics measurement requires an extensive collaboration of developers from multiple clients, backend, and data teams. We’ve faced many challenges during the process:
- Platform constraints
- Different implementations on each platform (e.g. how messages are cached and rendered)
- Standardizing the metrics definition and deployments
- Managing data updates and versioning
- Collaboration among multiple platform teams
Here are some lessons we’ve learned so far from tackling these challenges.
Business objective-driven workflow
Working backward from mock analytics and a mock dashboard is super helpful to define the data format and to decide upon a reasonable range of stats. Have an expectation of what is normal and what is not. For example: What range should the metrics stats fall in? What possible values should metadata have? What should the distribution of metadata values be? If anything falls out of range, it could indicate either a tracking error or an actual performance problem to be addressed.
Identify leads to follow through and make decisions
Since this work involves multiple teams on several metrics over a relatively extended period, it’s important to have directly responsible individuals leading the decision-making process. Try to pair up developers from each platform to work on the same metrics at the same time, and minimize the chances of developers joining or leaving the project as this can introduce friction.
Invest in processes and tooling
It’s essential to track the metrics updates as soon as changes are deployed to tighten the feedback loop. This process will be much smoother with investment in real-time debugging tools on all platforms; it’s impractical to wait for 2–4 weeks for production data to verify the validity of data, especially for the mobile release cycle. You need to be able to detect tracking errors and performance trends both locally and on dogfood or beta builds.
Early knowledge sharing and training
There will be knowledge gaps for client developers to understand how other platforms work: how data is sent, formatted and stored in the data warehouse and the most efficient way to organize, query data and set up dashboards. It’s beneficial to encourage and coordinate knowledge sharing to get on the same page and avoid surprises down the road.
With unified cross-platform performance metrics, application developers can set shared goals for a consistent end-user experience on all client apps.
This isn’t the end of the story for metrics measurement improvements, though; we’re working on automated performance testing and regression detection, along with adding more granularity to metrics during app sessions and on specific user actions. It’s just the beginning for the cross-platform performance metrics to mirror our users’ experience and help us make Slack apps faster.
Interested in the teamwork described in this article? Stay tuned to our Engineering blog for more information of what’s next on performance measurement and optimization. Or even better: Come and join us!