Ever open a file on dropbox.com, or click a shared link your coworker sent you? Chances are you didn’t need to download the file to see it—you saw it right in the browser. This is the work of the Previews team at Dropbox.
Previews are part of the core Dropbox experience. They allow architects to access their entire portfolios on dropbox.com while at the job site to show their work. Designers can send work-in-progress to clients without worrying about whether they have the correct software installed. Office managers can review, comment, and annotate new office design proposals, regardless of the file format.
For many users, a preview is their first interaction with Dropbox. Close to half of the previews we serve are documents (in formats including PDF, Microsoft Office, and Open Office). Unlike images, documents need to have a preview generated. This can take time. Our users want to see their content as soon as possible, so we have to provide great performance.
Over the last year, the Previews team has been on a journey to make our document preview experience the fastest in the industry, and we’re happy to share what we learned.
At Dropbox, all documents are converted to PDFs before being previewed. This preserves as much detail from original files as possible, while achieving compatibility on all clients. Our sibling team, Previews Infra, manages a large fleet of servers that handles file format conversion for us. Thanks to their work, our task is reduced to figuring out how to display PDF documents in the browser as fast as possible.
Early versions of Dropbox directly embedded the PDF on the web page and relied on the browser to render the file. PDF renderers in browsers tend to have very good performance and high fidelity. However, the downsides were significant. We had very little control over the look and feel of PDF viewers. This made it hard for us to achieve a consistent user experience across browsers. More importantly, we had no ability to add collaborative features like comments, annotations or highlighting.
For the user, previews aren’t just a core experience—they’re critical. This means it’s paramount for us to render them first, even at the expense of delaying collaborative features such as commenting.
While content prioritization was quite successful, it still took a long time to download the full document and PDF.js. Further performance improvement required us to explore approaches beyond PDF.js.
Speed is paramount
For users to perceive a preview as fast, you have to show them something as quickly as possible. Taking as an example iOS apps that display a screenshot before actual functionalities are fully loaded, we wondered if it would be enough to simply display a high-resolution thumbnail of the first page until the document was downloaded and rendered. This idea turned out to perform very well in real world usage. It dramatically reduced the perceived slowness of the preview.
Later, we began to automatically load and render thumbnails for subsequent pages as the user scrolls. Aside from not being interactive, thumbnails have very little difference from the actual previews rendered with PDF.js.
Beyond performance improvements, our First Page View work gave us validation for another idea we had pondered for a long time: Server-Side Rendered Previews.
Although using PDF.js represented a huge improvement over embedding the PDF directly, it also brought a number of challenges.
First of all, integrating PDF.js with Dropbox was quite difficult, if not downright hacky. PDF.js was designed to be Firefox’s integrated PDF viewer, rather than a component of another product, so it provided limited support for our use case. Furthermore, PDF-based exploits are extremely common, so we decided to put PDF.js into an iframe on a separate domain, so that malicious PDFs could not access cookies on dropbox.com. This dramatically complicated our build and deploy process. It also necessitated a cumbersome
postMessage call for communication between PDF.js and the main frame.
More importantly, performance-wise, PDF.js left a lot to be desired. PDF is an incredibly complex file format—the specification is more than a thousand pages long, not including the extensions and supplements. Thus the PDF.js implementation is highly elaborate, which resulted in long download and execution time, and there is little we can do to improve it.
For some time, the team had entertained the possibility of moving the render pipeline to the server side. This would allow us to transfer only the part of the document visible to the user, dramatically improving TTI.
Hacking an alternative to PDF.js
Because we had been using PDF.js for a long time, it was natural to first explore simply moving it to the server. Unfortunately, running PDF.js with Node results in poor image rendering quality. Not acceptable. The proposed alternative was running PDF.js in Chrome. At Dropbox, binaries are executed in a jailed environment and only whitelisted system calls are allowed. While borderline paranoid, this approach has saved us from zero-day vulnerabilities in third-party libraries we rely on. To move forward we needed to get hacking.
Dropbox maintains a great tradition for getting back to our innovative roots. Every year we hold an in-house hackathon aptly named Hack Week. During Hack Week most company employees pause regular work and focus on any project they wish. However, projects that move the company forward are encouraged.
During the 2017 Hack Week, I experimented with using PDFium as a Previews backend. PDFium is the engine that powers Chrome’s PDF Viewer, and is based on the battle-tested Foxit PDF SDK. However, since it is designed for client use, it wasn’t clear whether PDFium would be suitable for server-side usage. Ultimately, I put together a prototype that was much faster than our PDF.js based viewer. At the Hack Week presentation expo it earned the Cornerstone award for a project that makes significant contributions to Dropbox’s foundation, be it product stability, trustworthiness, or performance.
The team then began comparing the benefits and risks of the two approaches. It soon became clear that PDFium allowed us to do a lot better than PDF.js. We could build a PDF viewer from scratch, designed specifically for Dropbox, and it would be secure, fast, and easy to develop. The render quality of PDFium surpasses PDF.js on many documents, especially those that use obscure PDF features. Extracting text as well as positioning it correctly is trickier than with PDF.js, but still feasible. We decided to go all in on PDFium with a project called QuickPDF.
QuickPDF consists of two components: a server-side renderer that splits the PDF into parts, and a client-side viewer that reconstructs the file from those parts and displays them in the browser. The two parts are intentionally decoupled, in case we find a better solution than PDFium.
On the server side, we wrote a statically-linked binary in C++ that uses a modified version of PDFium to render each page of the PDF as a PNG image. It parses the metadata, including the page number and page dimensions, and serializes them as JSON. It also extracts the text and positioning information by grouping adjacent characters with the same font and size on the same line into text boxes. Each text box is represented by an object that stores the text, position, width, font size, and font family. The binary is executed by our file conversion system inside the secure jail, and the results are cached.
The client side is a React application. It fetches the document metadata. Using page count and page size, it draws a skeleton of the document on screen. As the user scrolls, it fetches the pages visible from the server. Each page consists of an image and a transparent layer of text that enables text selection. Hot Areas—implemented in PDF as Annotations—are also rendered, to enable clickable links.
Drawing the text overlay accurately is the most difficult part. Without enough precision, the text selection can be flaky. To make matters worse, the user can freely zoom the page, complicating the positioning. After some careful study of the PDF Standard, we decided to draw the text layer at 72 DPI, the native resolution of PDF, and scale up or down as required. For each text box, we use the original font when available, and substitute a similar one otherwise. The text is then created at the specified size. After the box is drawn on screen, the width is measured and the entire box is stretched to the width specified. This also takes care of different kerning. The box is then rotated and translated to its position on the page.
To make sure the application was fast enough, the team employed multiple optimization techniques. Requests for text and metadata are batched as much as possible. Pages are “over scanned,” i.e. we render more pages than are currently visible, so that the user doesn’t have to wait for a page to download. Since text overlay rendering is expensive, we defer it until the user is no longer scrolling. Using these techniques, we are able to achieve a butter-smooth experience across all supported browsers.
QuickPDF proved to be a huge success. Our 75th-percentile Time to Interactive was reduced by half. The biggest improvement came with PowerPoint files. These files embedded with high-resolution graphics and video are very large. Previously, a significant percentage of our users would leave the preview before it could be rendered. After implementing QuickPDF our PowerPoint preview success rate dramatically improved through lower abandonment.
The methods above are the ones that worked for us. Needless to say, we also experimented with several approaches that didn’t work so well. Through trial-and-error the team learned a lot of valuable lessons that we believe apply to everyone who cares about performance.
Measure. Measure. Measure. In early stages of development we suffered from a lack of reliable metrics. Different loggers returned conflicting results, and we didn’t have a detailed breakdown to guide our optimization efforts. As we rolled out different experiments and fixed the logging, we realized that the lack of historical data made it very hard to measure effectiveness. After that discovery, logging became the top priority for each project.
Define Metrics and Goals Carefully. While we decided early on to optimize for 75th-percentile Time to Interactive (p75 TTI), it took us some time to define, scientifically, what this metric would represent. For example, how would we define interactive? Should we measure Cold Starts, i.e. new users, and Warm Starts, i.e. those who had visited the site previously, and thus have most of our resources cached separately? What would the 75th-percentile cover, hours, days, or weeks? Due to limitations of our early logging, the number we reported was a weighted average of p75 across different file formats. Is this acceptable? Having exact agreement on what the different elements of each metric means is crucial, both to having engineers on the same page, and also for communicating with external stakeholders.