Hacklang at Slack: A Better PHP


Photo by ian dooley

Slack launched in 2014 with a PHP 5 backend. Along with several other companies, we switched to HHVM in 2016 because it ran our PHP code faster. We stayed with HHVM because it offers an entirely new language: Hack (searchable as Hacklang).

Hack makes our developers faster by improving productivity through better tooling. Hack began as a superset of PHP, retaining its best parts like the edit-refresh workflow and request-oriented memory model that enable speedy development. In addition to a number of quality-of-life improvements, Hack adds a better type system and a static type checker, which help catch bugs and allow developers to code and refactor with more confidence.

In this post we’ll talk about how and why we migrated to Hack, the benefits it gave us, and things to consider for your own codebase.

PHP’s type system has come a long way since PHP 5, when it was not possible to annotate return types, class properties, or scalar types. Its remaining holes, like the lack of generics, may be resolved in the future. But its biggest flaw is that types are only checked at runtime. This is the most costly time to find out about type-related bugs, either by breaking a test suite or worse — a user report or production error log.

With Hack, type checking happens statically (without running the code) and as you type. Change the signature of a function with hundreds of call sites, and you’ll see errors for the ones that need updating before even hitting save.

This is a game changer for productivity — the difference between finding a bug milliseconds after typing compared to waiting for a comprehensive test suite (or finding out after deploying) is hard to overstate. It’s akin to the productivity difference between developing websites in PHP vs. C. With Hack, you don’t bother trying to run the code until the type checker is passing, and by then it usually just works. This allows Slack developers to build, and refactor, with confidence, focusing testing efforts on higher value areas like logic bugs which static typing can’t help prevent.

Static type checking is possible in PHP with community packages, and if you’re using PHP I’d strongly recommend using one of these. However, Hack’s type checker has the advantage of a much more full-featured type system to work with. Hack is built from the ground up to enable static type checking, with features PHP lacks like generics, shapes, enums, hack arrays, and a well-typed standard library to enable rigorous static analysis.

We started with Hack in partial mode, which treats all untyped values as the “any” type, usable for any purpose. TypeScript takes the same approach. This enabled an incremental migration—adding types over time. As files became fully typed, we changed them to the default strict mode so that they stayed that way.

Surprisingly, gradually adding types to a weakly-typed codebase made me more thoughtful about type safety than I ever was working in strongly-typed languages like Java or Go. Instead of a requirement to get the compiler to run, types were a conscious decision to add value to the codebase. We had to justify spending time adding types by observing how they changed our working lives. Some parts of the codebase were easy to type, but others required refactoring to enable type safety.

Not only have we found and prevented bugs, but types serve as a form of in-line documentation that are verifiable (unlike comment blocks), helping everyone read and understand the code. They also serve as a contract between different parts of the codebase. This has been crucial to productivity in a large, shared codebase like Slack’s backend.

Hack’s type system has one feature in particular, Shapes, that caught on like wildfire at Slack, and I believe it’s the reason we never looked back once we introduced Hack to our codebase.

PHP’s array type, bewilderingly, can act as both a list (an ordered set of values) and a map (a set of key value pairs) at the same time. Most programming languages use separate types for these. In my experience, this is an endless source of bugs in PHP code, especially as functions like array_merge treat list-like and map-like arrays differently.

Hack improves upon this by separating these into different types and using generics to describe the types of their keys and values. A list-like array containing strings is a vec<string>, and a map-like array with string keys and integer values is a dict<string, int>.

But what about dicts that contain multiple types?

dict<string, mixed> is a valid, but not particularly useful type annotation, which says the dict contains string keys and values of any type.

Enter shapes. A shape is an array that contains known keys with specific types. Keys may be optional if preceded by ?. These example shape definitions represent the arguments of an http POST request, which has many optional fields:

type http_post_options = shape(
?'timeout' => int,
?'port' => int,
?'http_basic_auth' => string,
?'headers' => dict<string, string>,
?'form_data' => dict<string, string>,
?'json_payload' => JsonSerializable,
?'user_agent' => string,
?'follow_redirects' => bool,
);

This function signature uses that shape to type the $options:

function http_post(
string $url,
http_post_options $options
): http_response {
// ... implementation here
}

A call site might look like this:

$result = http_post('https://example.com', shape(
'timeout' => 10,
'form_data' => dict['example' => 'test'],
));

Not only does this help ensure the correct types are used for each field, it also helps prevent typos for the names of keys both in the call site and in the function body where the shape is accessed. This makes the shape much more impactful to developer productivity than a simple array type annotation. Before shapes, assembling a call to such a function would require reading its body or a large doc block (which may not be fully up to date) to understand the names, expected types, and “optional vs. required” status for each argument.

Shapes are used for a variety of use cases at Slack, including:

  • Database rows (code-generated shapes directly from DB schema)

As more features are added to Slack, each request tends to have more work to do. To keep the user experience snappy, parallelism is a common solution — doing multiple things at the same time in a single request.

In many programming languages, adding parallelism means adding significant complexity with mutexes, thread-safe data structures, or callbacks. These things slow developers down, making code more difficult to reason about and debug.

Hack is one of a handful of languages that implements the Async/await pattern for multitasking without multithreading. Async/await is a simple abstraction that allows functions to be paused while waiting for I/O, freeing up the runtime to schedule other tasks. By simply adding the async and await keywords and following a few guidelines, code can be migrated to take advantage of parallelism without breaking the mental model of how the code works.

Here’s an example using the concurrent code block to fetch data from two sources at once. These fetches were previously done sequentially. Adding await and concurrent keeps the code easy to read while allowing the fetches to take place in parallel.

async function get_mentions(User $user): Awaitable<vec<Mention>> {
concurrent {
// fetch @user mentions
$at_mentions = await get_at_mentions($user);
// fetch @channel mentions for channels the user is in
$channel_mentions = await get_at_channel_mentions($user);
}
return sort_mentions($at_mentions, $channel_mentions);
}

HHVM has come a long way since Slack began using it. Breaking compatibility with PHP was a controversial decision which required us to eliminate every last line of PHP code and dependencies from our codebase, but has enabled huge efficiency and soundness improvements to the language. Since the HHVM 4.0 release that removed PHP support, the developers have rapidly removed “PHPisms” that inhibit type safety and/or performance, while adding useful new features. Keeping up with these updates in a large codebase is nearly a full time job.

The largest downside to leaving the PHP community is the loss of an extensive ecosystem of open source packages on Packagist. Luckily, Hack projects can still be published on Packagist, and there are several high quality ones:

  • HHAST enables expressive lint rules and automated code migrations with a Syntax Tree, unlike PHP’s packages which involve parsing a token stream

As Hack frees itself from its PHP past, I’m excited to see it become a first-class language in its own right. While it’s no longer feasible to gradually migrate a PHP codebase to Hack, I expect to see more developers choose Hack for new projects as the language stabilizes, especially if they have familiarity with PHP and are looking for something better.

There’s a general trend in the industry towards adding static type checking to interpreted languages, with multiple options for Python, JavaScript, and Ruby. Combining the convenience of interpreted languages with static type checking is worth considering for code bases of all sizes.



Source link