Deliberate Decomposition: Moving to Microservices in a Considered Way
In the first half of 2019, we were informed of a new regulatory requirement that would involve significant changes to two of Enova’s lending products that ran on monolithic, legacy code. One of our guiding engineering principles is to continually work to make our artifacts safer and more extensible, and we worried that simply adding the new functionality into the existing implementations would further incur technical debt and duplicate QA work.
Design, but driven by the domain
Microservice (and Service Oriented Architecture) systems are famous for collapsing into ineffable distributed monoliths or galaxies of unnecessary microservices. How did we — knowing that we would require new services — arrive at a good architecture?
First, we modeled our domain — that is, we rigorously understood our problem space. Once we had that understanding materialized, it was clear that some concepts were already complete in existing line of business applications, some were extractable but poorly modeled in these applications, and some were entirely novel. Commonality, use, and long term architectural direction informed which domain models would benefit from externalization.
The moment of reckoning was in front of us. We had several dimensions in our tradeoff optimization:
- green field purity vs. shipping quickly,
- monolithic cohesion vs. micro-serviced decomposition,
- opinionated tooling vs. unstructured freedom
None of these dimensions were independent from the others, and we iterated over a number of designs before settling on our approach.
With our general approach, we worked hard to build a 1:1 map between our new services and a coherent, sensible abstraction. This of course included separation of responsibility and single sources of truth. With those considered abstractions, interface design was proceeded better than it would have otherwise.
First class interfaces
One of our primary design features was to make our services as explicit and expressive as possible, to make our modularization more reliable and our coupling looser. To this end, we enforced explicit contracts on every inter-service interaction and performed ongoing static analysis on the set of contracts as a whole. We developed our new services from the outside-in, to allow greater concurrency of development and independence when testing. Importantly, our tooling around these interfaces was designed from the get-go to be iterable, as optimization and changes in the problem space would inevitably require their updating.
We also chose to make these new services event-driven, an approach that suits our core business well — our complexity comes along a long life cycle, with very little horizontal relating between data. What’s more, publishing events along a common fabric allows for easy tooling for observability and debugging, as one can listen to events to isolate exactly what happened where. With data in motion and minimally-stateful processors, we get some of the virtues of functional programming at a system-wide level.
With high demands for reliability, correctness, and flexibility, we chose to use hexagonal architecture — diligently separating our transport, serialization, and business logic layers so that each was as functionally pure as possible. This bought us the ability to transpose a piece of business logic to another service with minimal side effects, as well as the ability to change the operations of our new services without endangering developed features.
Initial lessons
After launching this new system, we’ve already had some valuable feedback.
Firstly, the emphasis on overall system understandability and observability has more than paid for itself, leading to great agility in debugging and updating the system. Secondly, our fervent commitment to embracing iterability was if anything undersized — building our overall system to be mutable forced us to respect and be precise with our boundaries and contexts.
Ultimately, our system design didn’t hinge on a question of how many services to run or how thinly to slice them — it relied on creating a flexible platform, on well-considered data models, and on making relationships between and operations on those models first class citizens. An operation could be (and was!) moved between services without disrupting anything because we dethroned runtimes in favor of data and made sure our system was explorable.
And all this was possible because we kept our abstractive layers clear and separate.