Fall is here! Change is in the air. And as of last month, Kickstarter runs its two biggest applications on Rails 5: our payments app, Rosie, and our main app, Kickstarter. In this post, we’ll concentrate on the Kickstarter Rails 5 upgrade.
We update our apps because we want to ensure the security and performance of our site — but getting to the next major Rails version on a nine-year-old legacy application was no easy feat. We’d like to share some of the lessons learned during this upgrade by breaking down our upgrade process and discussing some of the issues we ran into along the way.
First, we’ll discuss the planning and scoping by taking a look at how we defined success and how we organized the team. Then, we’ll walk through the twelve steps that got us to Rails 5.
Timeline, Team, and Success Metrics
At the outset, we scoped about three months for this upgrade. We’re happy to report that we successfully hit that mark.
This project was scoped for one person to lead and implement, but included invaluable help across our Ops, Platform, and other engineering teams. Across upgrades, we refer to ourselves as the #upgraderz and have a Github team and Slack channel to communicate any issues that arise during the upgrade. Without this team of #upgraderz, a major app upgrade would not have been possible.
During the upgrade process, we upgraded Rails versions for both our payments and Kickstarter apps, as well as the Ruby version of the Kickstarter app. We based success metrics on:
- A timely upgrade
- Limited downtime due to 4.2-compatible upgrades
- Limited downtime due to the upgrade itself
- Cross-training and education
We’ve broken our process down into twelve steps to replicate on future upgrades.
Step 1: Incremental upgrades
The first step we took was to try to release everything but the major version. We upgraded Rails from 4.1.x to 4.2.x back in April 2016 and developed a good team that understood some of the potential challenges we might run into when upgrading to Rails 5. Then, over the summer, we kept up with minor patch upgrades. Before tackling 5, we upgraded Rails 4.2.8 to 4.2.9.
Finally, we took this opportunity to upgrade Ruby as well. At the start of the project, we took about half a week to upgrade Ruby 2.2.5 to 2.4.x. The benefits of this were twofold. First, gem dependencies would sometimes bump to a version that supported Ruby 2.4 and Rails 5, and if we didn’t have both, this would be problematic. Secondly, this upgrade provided good practice for releasing a new version. One lesson we learned, for example, was to always upgrade to the next minor patch. We learned this after upgrading to 2.4.0 instead of the most recent patch, which resulted in a developer struggling for a day on a bug that ended up being due to random errors in MRI 2.4.0 caused by def-delegators. Lesson learned!
Step 2: Scope work based on the release notes
Both the Rails upgrade guide and release notes are well documented. This allowed us to start building a good project outline. We took the release notes for each class and broke them down into three categories — non-master–compatible change, master-compatible change, and deprecation change — based on our knowledge of the codebase and reading of the upgrade docs.
Careful documentation of our work throughout this process was integral to its success. We kept a master Trello card on the Ops Engineering board with tasks broken into categories: dependency upgrades, master-compatible upgrades, non-master–compatible upgrades, and non-necessary upgrade TODOs. We also carefully labeled pull requests associated with the upgrade into
post-rails5. For bigger changes necessary in the upgrade but not compatible in master, we opened pull requests off the Rails 5 branch itself.
Step 3: Just upgrade!
After documenting our way forward, we tried to just install the gem! We added the new version to the Gemfile and… immediately ran into a million dependency issues. Not only were there gem changes for dependencies where we had to track down changelogs, but we had to open many PRs in our custom gems or gems we relied on to make them compatible. It took a week just to get the Rails 5 gem properly installed with no dependency errors. In the process, we went through our entire gem list and read every single changelog to make sure we were on the best possible version. Ready4Rails was a really helpful site for navigating these gem upgrades, but any Rails developer will know that a nine-year-old Rails app’s Gemfile can be a treasure trove of mysterious custom dependency conflicts.
Once we’d upgraded the gem, we were able to run Rails’ handy upgrade script, which gave us a diff of all the Rails 5 config changes. We opened an initial PR against our Rails 5 branch for the team to review, since many of these changes would be the most central to our infrastructure and being aware of any challenges early on would be helpful.
Step 4: Merge low-hanging Rails 4.2 compatible changes
Merge any low-hanging fruit in master. This is all the stuff that got missed in previous upgrades or easy fixes for deprecation warnings in Rails 5. Things like switching to assert_nothing_raised in Rails 5 were easy and digestible fixes.
Step 5: Make the build green
This was the longest part of the process. After merging all low-hanging fruit to master, we embarked on turning our CI green. We took a snapshot of code coverage before tackling these failures to give us a sense of what making the tests pass would get us. We started with nearly 4,000 unit and integration test failures in our CI pipeline with a code coverage of about 70 percent. This was really helpful for knowing how confident we could be in our test suite.
Again, we approached this task with solid documentation. Many of the test failures could be grouped together and patterns identified. We kept track of all the moving test failures and which test fixes created even more failures. This documentation process took the form of namespacing PRs (i.e.
pre-rails5) and spreadsheets sent to the team.
We tried to treat fixing tests as a good opportunity for refactors as well. If we found problematic tests or logic, we took a little extra time to alleviate this tech debt along the way.
Step 6: Remove all noticeable deprecation messages in the CI
Once the test suite was completely green, we looked solely at the changelog for deprecated features of the Rails API and tried to fix as many of those as possible. We also used our unit tests to surface a lot of these. Some of them are hard to miss. For example, the new Rails 5 controller tests use kwargs in ActionController::TestCase and ActionDispatch::Integration HTTP methods. This means that every line that had an HTTP request spit out an error. There are thousands of these across hundreds of files in our test suite. This was definitely the most tedious step in the process, but it was incredibly rewarding to see clean test runs.
Step 7: Smoke test the site on major flows, fix any issues that arise
After the first six steps, we were about halfway done with the process. Since tests do not cover everything and also often mock out critical flows, we started to smoke test some of the major flows and ran into several problems. Over the last nine years, we’ve developed quite a few patches and “creative” fixes that rely on Rails logic that is subject to change.
Step 8: Now that the tests pass and there aren’t deprecations, backport everything
We tried to backport as much as we could, including Rails 5-specific patches we wanted input on. We did this using tags and namespaces to the Rails 5 major version. This allowed us to stick with incremental changes and debug any challenges that came up before the entire gem was updated in master. We merged over thirty pull requests to master before even opening the Rails 5 pull request.
Step 9: Find reviewers
Once there was nothing else to merge to master, we opened a PR and heavily documented it for reviewers. We included documentation about why the change needed to be made, as well as a justification for why it couldn’t be made in master. In the end, the change was 300 files across our app code — a sizable change. However, most of the changes were due to test files that needed to be updated because of the keyword arguments change. By breaking down the context of the change, we were able to specifically direct reviewers to a subset of the files to review.
In the weeks before we opened the PR, we reached out to various feature teams and asked for a representative to help review it. We recruited them as smoke testers and reviewers to look through the site and test the major flows that they were the most familiar with. When we opened the PR, we charged these engineers with specific responsibilities, ensuring that the PR wouldn’t sit open indefinitely.
Step 10: Set aside a day to push the branch to staging and smoke test there
Finally, when we were confident in our flows and test suite, we decided to take a day when no one was deploying and deploy our branch to staging. Our staging is slightly different from our development environments and deployed development environments in a couple of ways. First, it connects directly to a sandbox payments system. We were getting a lot of false timeouts on payments deployment development environments that went away in the staging environment. Secondly, our staging config is much closer to our production config than the development config.
Another successful idea was to deploy to staging on the Sunday before we deployed to production. We were able to get out of the way of developers and were free to test things on staging without disrupting the development workflow.
Finally, deploying to staging allowed us to practice the worst case scenario: a rollback. The ability to perform a rollback to a previous Rails version illuminated a critical flaw. We had forgotten to change the cache version for Rails 5 to avoid Marshal conflicts. Ecstatic that we caught this early, we were able to implement a fix for production and practice the rollback on staging successfully.
Step 11: Merge, deploy, and wait for things to break
The Monday morning after our Sunday staging deploy, we merged and deployed Rails 5 to production. We ensured Rails subject matter experts were around in case anything broke. Immediately after the deploy, things started to break. We quickly triaged the things that were breaking and made calls on what needed to be fixed and in what order. Our careful planning and contingency plans allowed us to work calmly in the face of these issues. Our cross-training and preparedness allowed us to spot issues quickly and identify fixes on the spot. Overall, we suffered no major downtime and very few user-facing issues.
Step 12: Post-upgrade tracking and review
After a successful merge and deploy, we continued to keep track of any issues that arose and worked closely with our community support team and support engineers to ensure that any Rails 5-related bugs were escalated to the team as quickly as possible.
We concluded the project with a Project Retrospective with the entire engineering team, as well as a representative from community support, to discuss issues and reflect on this process. In this retrospective, we revisited our success metrics as a team:
- A timely upgrade
Success! We hit our three-month timeline.
- Limited downtime due to 4.2-compatible upgrades
Minor outages, but handled with no rollbacks. The worst bug was probably upgrading to Ruby 2.4.0 rather than the most recent patch.
- Limited downtime due to the upgrade itself
We did not have to roll back any part of the upgrade, but we did experience some problems with untested bugs, which resulted in a spike in Zendesk tickets. We could have done better on this metric by developing a better plan for triaging the most important user-facing bugs as they came in.
- Cross-training and education
The team learned a ton, and having the retrospective at the end ensured that we carried lessons learned over to the next upgrade.
While we tried to be as thorough as possible in fixing bugs before merging Rails 5, there are always going to be things you don’t anticipate. The key is documentation and planning.
The biggest problem we experienced throughout the process was with a relatively straightforward change in the release notes: ActionController::Parameters no longer inherits from HashWithIndifferentAccess. Our reliance on
ActionController::Parameters inheriting from a
Hash is incredibly widespread in ways that were hard to systematically track down without reading each file. Fortunately, unit, integration, and smoke tests caught most of them. But there were production-specific issues like tracking events that were unknown unknowns to us. The day of deployment, this was definitely our most pervasive remaining issue.
Another unknown unknown was how our API changes were going to impact our mobile applications. While we had extensively smoke tested flows on our web application, we did not have any member of the native team on our #upgraderz team. In hindsight, this was a mistake. Because the leaders of the project were less familiar with the native applications, it was not prioritized. While only a few small bugs specifically impacted mobile applications this time, next time we will prioritize getting input from members across the engineering team.
We are already behind where we could be:
However, we feel we are in a good place for now and are proud of our efficient ascent to this next major version of our app.
I learned a ton during this project and could not be more proud of our team. While we did spend many days hitting our heads against technical walls trying to dream up ways around various bugs, our countless hours of planning, cross-team communication, and documentation made this project a great success. Overall, our biggest piece of advice to anyone looking to upgrade a legacy application is to take that into consideration early on and actively communicate with stakeholders during all phases of the project.
I hope you enjoyed and learned something from this post. If you have any questions, don’t hesitate to reach out.
Always be upgrading!