Learnings from the journey to continuous deployment


Modes of RTF (record and replay) 
 

    2. Simple integration test: The simple integration test enables developers to mock the calls to external services programmatically in the test code. 

The primary goal in this phase of work was to provide early feedback to developers to identify and fix code issues. One major benefit: the amortized cost of fixing issues is cheaper in the development environment than at a later stage. 

Integration testing in staging environment
A successful build step will publish an artifact that is deployed to the staging environment to detect issues related to dependencies. A suite of tests simulating user scenarios are then executed by interacting with services running in the staging environment. Staging environments are well-suited for testing integrations with data stores (both online and offline) and dependent services, and the continuation of the deployment process is based on the success of these tests. The staging environment can be unreliable and services may be unavailable because engineers do not constantly monitored services’ health. This is why we have multi-stage integration testing and one reason why integration test frameworks, such as Rest.li Test Framework and Simple Integration Test, were developed to execute tests on the build step.  

Canary testing
As part of the deployment process, one last validation is to certify the latest version. At LinkedIn, we rely on automated canary testing. In canary testing, host(s) are updated to the latest version of software and a small percentage of users are routed to these hosts. The analysis runs for a preconfigured duration on a canary host and metrics generated are compared against metrics generated on a control host. Upon detecting any regressions/anomalies with the latest version, the changes are immediately reversed such that the impact is limited. 

Additionally, we’ve developed solutions to validate the performance of a service in canary testing across metrics like response latency, throughput, and load. 

Production
In microservice architecture a product is comprised of multiple services and it is possible for one under performing service to degrade the user experience. A monitoring dashboard provides information of service health and behavior, while monitoring critical parameters such as system load, API latency, and throughput to assess the health of software. Additionally, these frameworks are developed for running integration tests in the production environment without affecting the system stability. 

Summary 

By making these changes, we have improved the release cadence of key services from having a few releases per week to multiple releases per day. The fundamental first step to continuous deployment was to develop quality tests and automate the execution of tests during the build step. This guarantees product quality. From there, it’s easier to establish an automated deployment strategy. 

Acknowledgments 

This has been an amazing team effort from Anisha Shresta, Ayeesha Meerasa, Yusi Zhang, Walter Scott Johnson, Sajid Topiwala, Gururajan Raghavendran, Alisa Yamanaka, Bill Lin and Graham Turbyne. It would not have been possible to execute successfully without the immense backing and support from Pritesh Shah and John Rusnak. The vision of our team is to achieve continuous deployment for all products within LinkedIn. 

We would also like to thank the rest of the management, who have constantly been a source of encouragement and support, including Jeff Galdes and Dan Grillo



Source link