A Programmer’s Guide to Microservices & SOA – Zenefits Engineering


SOA or Service Oriented Architecture has been one of the buzzwords among architects and senior-developers, appearing commonly in job descriptions for the last few years. However, most of the definitions of SOA online are riddled with formal words, such as the one from OASIS: “A paradigm for organizing and utilizing distributed capabilities that may be under the control of different ownership domains. It provides a uniform means to offer, discover, interact with and use capabilities to produce desired effects consistent with measurable preconditions and expectations.”

The above definition, while precise, is quite abstract. This post tries to explain what constitutes a [micro-]service oriented architecture and how it differs from a traditional monolithic approach that a programmer may be accustomed to. This post is an introductory material aimed at beginners to SOA, who have already worked on some monolithic projects.


A very informal way to understand microservices is to imagine we split up every class of our design into a HTTP accessible webservice on its own. We would end up with a bunch of services, which together constitute a microservices-based architecture. The difference between SOA and microservices is just the level of granularity to which you decompose your classes (in a monolithic application) into independent HTTP services. The more minimal in functionality each of your service implementation is, the closer it is to be called a microservice.

Splitting a single application into multiple services imposes a few restrictions on our coding, but in turn gives us a lot of flexibility and power in scaling. Let us look at some of the coding/design constraints.

Stateless Systems

The fundamental difference from a monolithic design is in maintaining state information. All the individual classes which interacted via global variables earlier (for locks, mutexes, config variables, etc.) can no longer rely on them.

Let us take a simple example. We are building an application for shopping with just one type of item for sale. The Shopping application has two parts: the Inventory part that adds new items and the Sales part that removes items. Let us consider the following pseudo-code:

In the above code snippet (trivialized for brevity), we have a global variable stockItemCount, which is protected by a mutex mu. The AddToStock function of the Inventory class/type adds to this global variable whereas the UpdateStock function of the Sales class/type removes from the global variable. The mu lock synchronizes the access, such that the functions have exclusive access to the global variable on execution.

In a SOA, the Inventory and the Sales classes will become their own individual HTTP webservices. These new individual classes, viz., SalesService and InventoryService, may now run on different machines.

Inter-Service Co-ordination

So how do these different services potentially running on different machines share and synchronize access to common data? The solution is simple. We move away from the globalVariable + mutex pattern and implement a publish-subscribe or queueing pattern. What does that mean?

We move the stockItemCount management into a separate StockService which is accessible by both the InventoryService and SalesService (earlier considered classes/type). Let us take a look at a sample pseudo-code:

As seen above, we have two Classes which are converted into Services (SalesService and InventoryService) and a new third service named StockService. We also have a Q (a distributed Queue infrastructure) that we use. We have a Operation class/type, with a Type string, whose instance we will be adding to the Q. The AddToStock function in the Inventory service, creates a new Operation item of type “Add”, whereas the UpdateStock function of the SalesService creates a new Operation item of type “Remove” to the queue. The StockService has a ProcessQ function which goes on an infinite loop to fetch items from the Q and based on the Type of the operation, perform either addition or deletion of value.

It should be clear now that the SalesService and the InventoryService are now totally stateless. They just make use of the Q to communicate with the StockService. The meticulous reader may have observed that the StockService is still stateful. We maintain the count variable still as a global variable. In any large scale system, there may be some components which may not be completely stateless. We will have some drawbacks because of having such stateful parts which we will discuss in a future section.

The Q forms a very central part of the above architecture. The Q can be implemented by the programmer manually, and could potentially be deployed in a totally different set of machine(s) from either of A, B or C. However, there are some stable Queue implementations that we could use instead of reinventing the wheel, like Apache Kafka, RabbitMQ. If you want a hosted solution, Amazon SQS is offered by AWS and Cloud PubSub by Google. These systems could be called Messaging Middleware.

There are projects where a massive amount of data will be generated (say from sensors instead of humans) and we will need realtime processing of streaming data. We could use specialized streaming middleware such as Apache Storm or a hosted solution such as Amazon Kinesis.

Benefits of SOA

As we just saw above, what was a simple single process with two classes, became three different classes and a queueing system with four different processes across four (or more) different machines, to accommodate SOA. Why should a programmer put up with so much complexity? What do we get in return? Let’s explore some of the benefits in this section.

Horizontal Scalability

Let’s say we have a server with 4GB RAM serving a 100k requests per second for our above Shopping site. Due to an upcoming holiday season, we estimate increases in the visitors count and we will need to serve 400k parallel requests per second.

We could do one of two things:
(1) We could buy more expensive hardware, say a 16GB RAM machine. We could move our site deployment to this bigger machine until the holiday season and get back to the old system later. (2) We could launch another three of 4 GB RAM machines and handle the increased load. The former is called Vertical Scaling and the latter is called Horizontal Scaling.

Vertical scaling is appealing for small workloads but is costlier as we start to provision huge machines. Even if you could rent high-end VMs in the cloud, the pricing is not too friendly. Horizontal scaling is cheaper on your wallet while also providing more throughput and allowing for more dynamism.


In our Shopping application, the Sales and the Inventory Services are stateless so we could horizontally scale them individually. For example, we could launch 3 new instances of the SalesService to handle a holiday-traffic while maintaining the single machine for the Inventory service. This kind of flexibility would not have been possible with our earlier monolithic design. However, note that the Stock Service that we had was stateful so it could not be horizontally scaled. This is the drawback of having stateful components in your architecture.

Once we know that the systems can be horizontally scaled, the next logical progression is to make the scaling automatic. There are systems like Amazon Beanstalk and Google App Engine (to a certain extent) that allow your application code to automatically horizontally scale by launching new instances whenever demand increases. New instances will be automatically shutdown when bursts of traffic are reduced. This reduces huge IT administration overheads. All of these nice features are possible only because our application architecture was composed of stateless services.

Serverless Systems

The next step in the evolution of auto-scaling, is having code that automatically decides the number of servers on which it should run, instead of having to provision anything. To quote Dr. Werner Vogels, CTO of Amazon, “No server is easier to manage than no server.” We are clearly moving in this direction with serverless webapps. Amazon Lambda brings this functional programming dream to life, and Google has also recently entered this space with the launch of Cloud Functions. We have frameworks to build an entire suite of applications without servers using these services.

Polyglot Development

As we are deploying each service independently, we could use different programming languages, frameworks, and technologies for each service. For example, any CPU intensive service could be written in a performant language like Go, while a bunch of front-end code could be written in React/Ember with Node.js.

Mobile First

Since we have developed proper HTTP APIs for our application, in addition to the webclient, any mobile client too could use our webservices. Today, most companies start with a mobile-first or mobile-only strategy and do not require a webclient. Some pro-monolithic engineers tend to argue that the first iteration of development should be in a monolithic model and we could re-engineer for a SOA at a later stage of development, as development speed is faster in monolithic design. Personally, I disagree. Staring with SOA in mind from scratch with our modern day development stack, we could plumb existing things together instead of reinventing wheel. Frameworks and techniques exist to auto-generate a lot of code once we have finalized the APIs. Personally, I have had experience building web applications starting from both a monolith and in SOA from scratch, and I have felt happier with SOA code every time. YMMV!

Auxiliary Parts

Building a SOA based system, we need to have a lot more auxiliary support systems. If we do not have these auxiliary parts in place, it will be very difficult to measure, debug or optimize. Different companies implement different parts below, based on their business needs and deadlines.

Performance Metrics

The most important auxiliary aspect of SOA is to have precise performance metrics for each of the services. SOA without performance/metrics measurement, is as ineffective as body building or a weight loss regimen without food as fuel or a diet plan. We will not be able to rate limit requests, prevent DoS attacks, or understand the health of the service without measurement.

Performance measurement can be done in two ways:
(1) Measure the performance and show metrics by realtime event monitoring.
(2) Log various events, errors, response times, etc., aggregate these logs and batch process them later, to understand the health of various components. We will need a combination of both the approaches for any large scale systems.

Luckily there are plenty of tools, services and libraries available for this. AWS API Gateway is perhaps the easiest tool for registering your APIs and monitoring endpoints. However, we may need more fine grained measurements too (such as how long the calls to the database takes, which user is causing more load, what times are the loads high, etc.). There are various tools that we could use such as statsd, ganglia, nagios, etc. and various companies that offer hosted solutions too, such as Sematext, SignalFx, New Relic, etc.

Distributed Tracing

Tracing is a concept that is supplementary to metrics and performance measurement. When a new request comes to a service, it may in turn make use of 3-4 other services to serve the original request. Those 3-4 other services may in turn call 3-4 other services. Tracing helps us find out, on a per-request basis, the map of which services are used to serve it, how long it took at each point, and where the request is stuck if could not be serviced etc.

We could achieve tracing by giving a unique id / context object to each incoming new request in the outermost API which receives the request, pass it along as we make further API requests until the final response is finished. This context could be passed along as a parameter in the webservice calls. The monitoring of the tracing events could again be realtime or deducted from log-aggregation.


Assume that we are exposing an API in our StockService to list all the items that we have, along with its RetailPrice. If we have say a billion products, the response to the API will be huge. Not just the response, the system resources needed to build that response, on the server side will be tremendous. If we are fetching the billion items from the database, the caches will be thrashed, the network will be clogged, etc. To avoid all these issues, any API that could potentially list a lot of items should consider paginating its response by a pagenumber, i.e., an API call should take a page number as a parameter and should return only M number of items in a page. The value of M could be decided based on the size of each item on the response. We can optionally get the number of results that the user wants, also as a HTTP Parameter.

For example: – Returns the first 10 blog posts with label “tech” – Same as above – Returns blog posts 11 -> 20 with label “tech” – Returns the first 5 blog posts with label “tech” – Returns the blog posts 15 to 20 with label “tech”

API Versioning

If software never changes, we software engineers will be out of jobs. Software evolution is good. However, we need some contracts/APIs, so that the changes are smooth and do not bring down the entire ecosystem. Once we have exposed an API outside our developer team, we would be wise to finalize its request/response parameters.

In our StockService example that we discussed above, we could have the following API:

http://stockservice/items/ – Returns all the items.

Let’s presume someone later thought that it would not be wise to always return all the items and changed the behavior to return only the first 10 items. This change would break all the existing clients, who will all assume that there are only 10 items in total while in reality we may have a billion more items waiting to be paginated.

The easiest way to regulate API changes is by adding version to APIs. For example, if the original API to return all the items had a version param, we could just increment it like:

http://stockservice/V1/items/ – Returns all the items
http://stockservice/V2/items – Returns the top 10 items

The version need not be part of the URL always. We could take the version as an extra HTTP header also, instead of creating a new URL endpoint. It is a matter of taste and each approach has its own pros and cons.


Once we have multiple components in a system, there is a high chance that some part of the system may be down for updates. When such a thing happens, a service could choose to wait for some time before making any attempts to retry if it knows that the service will be failing. Martin Fowler has written in detail about this, which is a good read.


A new TCP connection takes time to establish because of the initial handshake delay. It will be foolish to not reuse these connections. There is an inherent need for retrying things in HTTP if things fail, before giving up. Some programmers do not like writing HTTP client code always either. It is often recommended to release SDKs for the APIs that we release, to facilitate programmers to consume our APIs easily. For example, a python programmer can merely import our SDK’s classes to add an item to our StockService, instead of having to write http retry code.

In the past we have had technologies like DCOM, CORBA, RMI etc. that aimed at doing distributed computing within walled gardens of technology. They lost out in market share due to the simplicity of REST services where HTTP verbs (GET, PUT, POST, DELETE) could perform remote operations, without the need for complex and mostly platform-specific stubs/skeletons etc.

There is a common middle ground where the best of both worlds are used. The most notable framework for this is gRPC. It’s an open source project from Google adopted by many companies (most recently CoreOS) that helps in providing a web API where the client SDK generation is also easy, with http2 support as well.


If these kinds of challenges interest you, Zenefits is moving away from a monolithic app to a microservices based architecture. We’re hiring!

– Sankar P

Source link