In the land of microservices, the network is the king(maker)

Recently, Goldman Sachs dropped a bombshell stating that the bank has “started a yearlong project that will shift about 90% of the company’s computing to containers!” Ignoring for a moment that this move will be largely powered by Docker, this is a momentous move made by one of the largest “technology” companies on the planet! It provides more concrete evidence — if any more was needed — that containers, and by extension microservices, have now graduated from web scale companies to mainstream enterprises, reshaping the infrastructure landscape in a way that was unimaginable just a few years ago!

Microservices represent such a radical departure from monolithic applications! Instead of running a giant blob of code (monolith) inside a virtual machine, much smaller code fragments (microservices) are run inside lightweight containers (Docker, CoreOS rkt) and stitched together by an orchestration service (Mesos, Kubernetes) to deliver the desired application functionality.

But, as microservices go mainstream and more developers get on the bandwagon, the question to ask is whether we have the right infrastructure tools to monitor, manage and secure these truly distributed applications. If not, what would it take to build those tools? And, how would those new tools look like?

Why microservices

There are several good reasons why the microservices architecture is gaining such rapid traction across enterprise of all stripes, but here are the top three:

Fast deployment: Code changes for each service can be made independently (as long as the API contract with other services isn’t violated) and therefore, build+test+deploy cycles speed up dramatically. Netflix, for example, often deploys code a hundred times in a single day thanks to their early adoption of the microservices architecture!
Efficient scaling: Each microservice can be scaled independently, which is a much more efficient way of scaling an application, because not every part of an application experiences the same amount of load and needs to be scaled equally.
Design autonomy: Developers get the freedom to employ different technologies, frameworks, and design patterns to design and implement each microservice, pursuing a horses for the courses strategy as necessary.

The microservices tax

Microservices, however, aren’t free lunch. Part of the “microservices tax” can be chalked up to the operational complexity of a distributed system: more scripts and configurations to deploy, larger overall memory footprint because of replicated libraries and data stores across services, and potential performance degradation because of API calls going over the network. However, an ever bigger chunk of the cost stems from the lack of manageability and loss of control of a distributed application built on a microservices architecture.

Distributed application logic: Unlike in case of a monolithic where the overall application logic is collocated, with microservices the logic is spread across the services and, more importantly, embedded in the control and data flow between those services. While the services each does its own thing — receive a request, apply logic, and produce a response — the functionality of the application is realized only by calling the right services at the right time, in the right sequence and with the right data. Furthermore, the same service could serve multiple different applications. Therefore, it is no longer possible to monitor, manage and secure an application based only on telemetry data collected at each service level.
Diverse technology stack: Real world applications are composed of many different services — for example, Amazon uses between 100 and 150 services to build a single page, while Google calls about 70 services for a single search. While many of those services are developed internally, a good many of those use open source libraries and third party software that the enterprise has little or no control over. To make matters worse, different services could be coded in different programming languages as necessary. For example, R is great for a recommendation engine service, but not so for general purpose services. Polyglot programming and the lack of control over third-party software makes it difficult to instrument code for each service, thereby affecting monitoring, managing and security.
Limited testability and debuggability: It is harder to design tests for a distributed application because of the challenge in anticipating all possible interactions between the constituent services. Ditto for debugging. As a result, like the “butterfly effect” in chaos theory, a minor change made to an individual service (that has already passed the service-specific regression tests) could conceivably have a much more catastrophic impact on the overall application.

Network to the rescue

So, if we cannot instrument the code for each service, and even if we could, if collecting data at each service level isn’t going to help us monitor, manage and secure a distributed application in a comprehensive fashion, then are we doomed?

Well, fortunately, we do have a knight in shining armor: the good, old network — the network that handles the east-west traffic flowing across the different services in a microservices architecture!

The east-west network carries all API calls and associated data across the services, and that traffic can be really voluminous — e.g., over 99% of the 5 billion API calls Netflix handles per day are internal (across services)! In addition, this traffic contains rich information on the sequence, timing and data of the various API calls. Therefore, a packet-level inspection of that traffic, combined with the application telemetry data collected for each service, should offer an unparalleled X-ray image of a distributed application in execution unlike any other we have seen before.

If we believe in that, then we could also envision how, the network enables a whole new class of infrastructure products that monitor, manage (e.g. load-balance) and secure distributed applications built on the microservices architecture. An application performance monitoring solution, for example, could offer a much more granular view of the application by analyzing the control and data flow of API calls occurring across the multitude of services without instrumenting any code or installing any agent. The same analysis of network traffic could also ferret out malware by detecting anomalies in the API calls across the services. Additionally, by overlaying virtual secure networks across relevant microservices, we could block unauthorized communication between services, or direct traffic to idle service instances for better load balancing. And finally, we could collect user engagement data for the various parts of the application, without embedding in the code any explicit calls to the data collection tool.

While I genuinely believe that the network will play an immensely strategic role in the microservices world, inspecting and storing billions of API calls on a daily basis will require significant computing and storage resources. In addition, deep packet inspection could be challenging at line rates; so, sampling, at the expense of full visibility, might be an alternative. Finally, network traffic analysis must be combined with service-level telemetry data (that we already collect today) in order to get a comprehensive and in-depth picture of the distributed application.

With the microservices architecture creating whitespace in the infrastructure market, it will be interesting to see how the vendor landscape divides up between incumbents and new entrants. But, no matter how it turns out, with microservices gaining strong foothold across enterprises of all shapes and sizes, the network is surely going to play a critical role in monitoring, managing and securing distributed applications for years to come!

Special thanks to @jvrionis, @arifj and @ravi_lsvp for their suggestions on this post and to @adrianco for pointing to a few references used here