Scaling Mobile Device Management for macOS with Chef at Uber


At Uber, the Client Platform Engineering (CPE) team is responsible for building both our IT infrastructure and advancing how we think about managing endpoints for workplace productivity. With over 20,000 global employees, hundreds of offices, and an ever-growing number of device types, we need to build flexible pipelines that allow us to easily inject new management workflows without slowing us down or causing technical debt.

Over the past few years, the Uber CPE team deployed Chef, a singular system that could scale to multiple operating systems while also requiring code review for all deployed changes as the primary endpoint management platform. Unlike traditional GUI-based management platforms, Chef allowed the team to create deterministic workflows based on Uber’s unique requirements.

While Chef continues to be the primary management tool at Uber, Apple’s implementation of new security features, such as Device Enrollment, Secure Kernel Extensions, and Privacy Preferences Policy Control, gated some portions of administration behind mobile device management (MDM).

In this article, we discuss how the CPE team scaled MDM on macOS at Uber through the standardization of user enrollment and versioning, as well as by leveraging API-driven Chef cookbooks to orchestrate and configure services for users across the company.

Mobile Device Management (MDM) at Uber

Although we use Chef as our primary management tool, in the past few yearsespecially with regards to macOSwe needed to adopt an MDM system to supplement our workflows.

After looking at many open source and third-party MDM solutions, we decided to extend our partnership with a third-party vendor and add macOS endpoints into its digital workspace platform.

We came to this decision for the following reasons:

  • Strength in numbers: We could leverage our mobility engineers during the implementation and design stages as this digital workspace solution was already being used for devices with iOS and Android operating systems.
  • Powerful APIs: We could leverage both the InstallApplication and InstallEnterpriseApplication APIs to install our custom tooling. Moving our custom tooling behind MDM meant that employees must be enrolled into MDM in order to obtain access to other tools.
  • User-level profiles: Because this digital workspace solution was so closely coupled with enrolled users, we could implement both device-based MDM profiles and user-based MDM profiles. Open source tooling like Chef, munki, and Puppet currently only supports device-based, local profiles. User-level profiles allowed us to improve our certificate installation pipeline for access to internal tooling.
  • Security features: Useful security features allowed us to instantly target machines using the Apple Push Notification Service (APNS).

Despite MDM being more mature today than ever before, Chef is still our primary management tool for ensuring our machines are in a known state and have a clear audit trail of all changes made to the fleet.

Standardizing macOS MDM enrollment

While Apple offers the Device Enrollment Program (DEP) for enrolling devices into MDM, larger companies can find themselves in a unique position where a bifurcated enrollment process is the only possible outcome.

Some issues with procuring DEP-capable devices include:

  • Worldwide availability: Your company may have offices or employees where DEP is not available. In some countries, your only possibility is purchasing Apple products that are marked as “consumer” devices. Consumer devices cannot currently be moved into the DEP workflow.
  • Provisional DEP: Provisional DEP is a process by which you can convert a consumer device to an enterprise device. Unfortunately this is only offered for iOS devices (i.e., iPads, iPhones, and iPods).
  • Authorized Apple resellers: While many countries have authorized Apple resellers, their terms may not be in alignment with your company or they simply may not be a recommended partner. If the country has only one authorized Apple reseller, you are forced to purchase consumer devices.

In addition to these points of friction, Uber’s global scale led to additional issues with the MDM user experience. At our scale, we knew getting every macOS user to either read an email or message in our internal systems was borderline impossible. To make matters worse, because of our worldwide procurement process (DEP isn’t available worldwide), some users would need to follow a DEP MDM workflow and others would need to follow a standard MDM enrollment procedure. Moreover, during internal testing of our MDM system, some users with Chrome Extensions like Google Music would get flagged as having a tool that was labeled as a “remote configuration,” thereby triggering a new Apple system called User-Approved MDM. This meant that even if users followed instructions, they would need additional instructions to achieve full compliance.

After much discussion, the CPE team came to the conclusion that we would need to write a custom wrapper around MDM enrollment. After some lengthy and tedious coding sessions, we decided to leverage UMAD, or Universal MDM Approval Dialog, deployed via Chef.

The UMAD wrapper offers the following features:

  • Dynamic DEP enrollment supported by macOS 10.9.5 and higher.
  • Fallback MDM enrollment if DEP fails to trigger on DEP capable devices.
  • MDM enrollment for non DEP capable devices.
  • User-Approved MDM detection and failure messaging.
  • A cut-off date, with increasingly aggressive UI prompts as the cut-off date nears.
Figure 1. Given the scope and scale of Uber’s business, setting up our MDM system for macOS was often a web of complexity, with multiple possible routes based on a variety of factors.
Figure 2. In the DEP Enrollment UI, the user is informed that they will receive a secondary notification on their computer (screenshot embedded within the UI under the third paragraph), and will need to click on the “Details” button to finish the enrollment process.
Figure 3. In the Non-DEP Enrollment UI, a manual enrollment process is required and a new “Manual Enrollment” button appears. The user is prompted to click this button.
Figure 4. In the User-Approved MDM Enrollment UI, the user must go to the Profiles pane of System Preferences and approve the dialog box. They are given a “System Preferences” button that allows them to quickly enter the System Preferences pane.

With UMAD tested, we gradually deployed it to our devices using the cpe_umad cookbook.

We noticed an immediate impact on the speed and efficiency of Uber’s MDM. Within one week, UMAD helped 7,000 employees enroll in MDM, and within six weeks, UMAD helped 16,000 employees enroll in MDM. Of those 16,000 users, approximately 4,000 had some kind of misconfiguration on their machines that was quickly remediated by enrolling in MDM using UMAD. Within 10 weeks, UMAD had helped 92 percent of Uber employees enroll their devices into MDM.

Standardizing macOS versions

Despite frequently updating and enhancing our Chef platform as well as deploying UMAD across the company, one of the biggest challenges we faced involved dealing with the intricacies of handling multiple versions of macOS across employee devices.

In 2018, the amount of macOS versions we were supporting was quite vast. Similar to how we standardized around macOS MDM enrollment with UMAD, we needed a similar system for system updates and upgrades. This would be critical for securing Uber data as well as having a consistent development platform for the company.

While we used munki for both self service applications and Apple software updates, a few of our team members had seen various issues with operating system updates:

  • FileVault: Devices with FileVault may not have an authenticated reboot. A user would start the upgrade and go to lunch, only for it to timeout or fail while waiting for the user to authenticate.
  • T1 devices: T1 macOS devices required internet access at the time of installation and munki may not be able to connect to the internet at the LoginWindow, resulting in potential failure issues.
  • T2 devices:T2 macOS devices included additional security on top of the T1 requirements. macOS updates and upgrades now require the machine to halt (shutdown) and not reboot. At the time, munki could not handle halting T2 devices, resulting in potential install failures.
  • Disparities in User Experience: The macOS user experience differs between operating systems: 10.5 through 10.8 devices had a Software Update preference pane, 10.9 through 10.13 devices had a unified Mac App Store and update tab, and 10.14 devices removed the unification and re-introduced the Software Update preference pane, but modernized it to look similar to iOS.
  • Disparities during testing on Apple devices: The user experience within munki is different than the user experience when going through the Mac App Store or Software Update preference pane, and can result in unexpected behavior.

At our scale, even 1 percent of device update/upgrade failures could result in hundreds of devices being impacted, so we decided to leverage, Nudge, a standalone open source tool for handling updates. Similarly to how we deployed UMAD, Nudge would also be configured and deployed via Chef.

Nudge is an open source, unified wrapper around macOS major upgrades (10.13 through 10.14), macOS minor delta updates (10.14.0 through 10.14.1) and macOS minor combo updates (10.14.0 through 10.14.2). An administrator simply needs to ensure that the macOS installer is available on the devices and configure Nudge with the minimum operating system that devices should be running.

Nudge allows us to unify both the upgrade and update experience into a single window or message, and then systematically link our interface to the various interfaces and tools Apple uses. By linking directly to Apple’s own binaries, we no longer have to worry about unexpected behaviors and can ensure a user experience that is a direct reflection of what non-enterprise Apple customers would see.

In simpler terms, unless there is an issue with Apple’s own installer or installation process, Nudge will not introduce any new complexity or untested procedure that Apple may not gracefully handle.

Figure 5. Nudge’s UI informs a user that they have a pending macOS update to install.

Around the end of 2018, we felt confident enough in our tool chains that we could standardize on macOS 10.14.1. At the time, Nudge only supported macOS upgrades, so we did not target machines running 10.14.0, and our most-used operating system at the time was High Sierra 10.13.6, with close to 9,000 devices running the software. By the fifth day of deploying Nudge in early November 2018, 10.14.1 had taken the lead with over 7,000 devices. In late 2018, with the help of Airbnb, Nudge received minor update support. Now that we are beyond the 90th day of Nudge being deployed, 10.14.3 (the most current version of macOS in March 2019) is our most used operating system and over 87 percent of Uber systems are using some version of Mojave.

Best of all, our service desk has not (yet) issued any tickets to our team regarding Nudge upgrade failures.

At Uber, we manage Nudge with our cookbook cpe_nudge, which you can also find on our GitHub.

Figure 6. Over a period of 90 days, deploying Nudge lead to a significant number of the computers we manage being upgraded to macOS 10.14.

 

Chef as a service

The primary design principle we are following in Chef is the concept of API cookbooks. This idea has been championed by both the Facebook Production Engineering and Facebook Client Platform Engineering teams over the years.

Some benefits from API cookbooks include:

  • Rather than hard-coded default values, values are “nil” or “false.” By abstracting these values, you do not have to continue to write new core code should the content you are manipulating change–you simply update the key/value pairs.
  • Key/value pairs can be conditionally overridden. This allows you to leverage a single cookbook that can work across multiple operating systems.
  • Since each value can be overridden, you can extend these same APIs to your end-users, allowing them to write their own values for features. For instance, cpe_user_customizations is what we use to allow our end users to customize actions like their default software installations and screensavers. These persist across all computers leveraged by a single end user.
  • APIs allow you to abstract code to other teams that may not fully understand how to use Chef but can understand key/value pairs. This greatly reduces the engineering cost for maintenance.

Given how beneficial open source cookbooks have been to our team, Uber approaches writing API cookbooks with open source in mind; in fact, there are zero opinionated default settings that apply just by including the cookbook in a runlist. Each API cookbook is disabled by default and must be both enabled and pass a configuration before any state will be managed.

Below, we outline the primary reasons Uber leverages this approach to cookbook development:

  • Being a good open source community member (contributing code, not just consuming it) improves code quality and can create robust feature sets.
  • Writing generalized cookbooks forces abstraction, which decreases specific logic and potential technical debt while also decreasing the likelihood of leaking proprietary information by accident.

The following code snippets are an example of what an endpoint engineer may use to deploy Sal (a modular, open source reporting tool for endpoints) to their environment. cpe_sal would run first, followed by cpe_profiles and cpe_launchd, both necessary cookbooks for cpe_sal to consume.

 

 

 

 

An endpoint engineer might assume that three cookbooks are running and, while this is technically true, if we drill into the cookbook, we will see a pattern: cpe_sal has an attribute file and the primary attributes are either false or nil.

 

 

 

In the actual cpe_sal resource file (the main code for cpe_sal), there are `return` guards around each action. If the values are false, that particular function does nothing. This allows us to place cpe_sal in our global run_list, without any concern that it will actually apply settings to devices. You must explicitly mark them as true.

 

 

 

 

 

 

 

 

Since Chef allows you to create any cookbook, you could simply create a cookbook that applies the attributes that your open source cookbooks need to consume. In the example below, we create a cpe_base cookbook that has some of the attributes cpe_sal would need to actually run a configuration. This type of cookbook will differ per company based on what open source cookbooks you consume, so create whatever you would like here and add it to your runlist.

 

 

 

The cookbook cpe_base is now added to the original run list and needs to come before cpe_sal, so that cpe_sal can consume the attributes you have modified.

 

 

 

 

 

By using this approach, we are able to open source cpe_sal without exposing any of the custom code we have implemented. These snippets are just a small example of how powerful API-driven cookbooks can be when leveraged correctly.

Moving forward

Scaling mobile device management at a hyper-growth company like Uber facilitated a creative mindset that relied heavily on existing MDM tools like Chef. Our experience developing MDM systems at scale gave us exposure to open source solutions we may not have previously considered for production environments. In this spirit, we encourage you to check out Nudge and our CPE Chef cookbooks for yourself and start building a more efficient MDM system for your workplace.

If tackling large-scale IT engineering problems interest you, consider applying for a role on our team!

Subscribe to our newsletter to keep up with the latest innovations from Uber Engineering.



Source link