Breaking Up a Monolith with Stateless Microservices
Microservices aren’t new. The term became a buzzword back in the ancient days of 2014, followed by a swarm of blog posts on how to break up our aging monoliths into service-oriented architectures. But like any fashionable tech trend, it’s not one-size-fits-all. If your monolith powers a complex, interdependent system — like Enova’s did — then if you break it up, you can end up with just some new (temporarily smaller) monoliths.
When we did just that back in 2012, we found that yes, the new services had divided up our platform… but instead of creating a true service-oriented architecture, we’d created a distributed system: each service was still divided horizontally (API, domain, and data layers) and vertically (across each of our brands). We hadn’t really addressed the main problem we were having: each brand’s development team was still stepping on each others’ toes, introducing risk and slowing them down.
A new hope
Our enterprise architect, Greg Lacy, joined the company two years ago and saw our struggles in a fresh light. His full vision for our future is more than I’ll cover in this post, but one nugget stands out as a helpful principle for any growing platform: instead of just breaking up a monolith into smaller monoliths, you can break up a monolith by extracting its logic to stateless services.
This is similar to the common pattern of encapsulating logic into service objects within a codebase, but because the new components are separate services, you aren’t limited: the new services can be in a different language, use different dependencies, or even be hosted in a different data center. And since they’re stateless, testing is fast and easy — no state to set up or side effects to worry about.
Time for an example
For the past few months, I’ve been putting this approach to use as my team works to extract and simplify the accounting logic at the core of our legacy app. To set the mood, that section of code is over a decade old, written in Ruby 1.8.7, incredibly complex with only apocryphal documentation, and of course absolutely crucial (almost all API actions related to servicing our loans end up in it at some point).
Our plan, devised with Greg’s help, was to extract the accounting logic to a stateless service. With a common API built on our domain model, the service itself could be switched out by each brand. We also needed, as a new feature, to generate a separate accounting journal that follows GAAP-compliant delinquency rules. Since the stateless accounting logic is a reusable building block, we decided to use another version of it to generate our financial records for reporting with a persistence wrapper to receive and process the same accounting events. The result:
Other than reducing the size of the monolith, there are some benefits here I want to call out explicitly.
First, the stateless nature of the service made it easy to expose directly to our stakeholders via a small React front-end, which lends a huge amount of transparency and visibility into the logic. We often have to manually field questions from business about how the accounting behaves in certain situations, and in the old system, answering it fully meant:
- Switching contexts and spinning up the legacy app
- Using a script to create, fund, and get a loan into the desired state
- Taking the requested action and sending the results over
- Repeating the process for a follow-up question about a slightly different scenario
Now, the process is simple and even self-serve:
- Open up the front-end tool
- Punch in the desired scenario
- See the results update in real-time
Even if your service isn’t a good fit for a front-end, since it’s stateless, it’s safe to use a tool like cURL or Postman to exercise even your production servers.
We can use the same service in multiple ways: both on its own (the demo tool), as a dependency (the primary use case), or layered into a bigger solution (the financial ledger). This means that, as your platform evolves, these services don’t have to be rewritten. For example, we have a stateless amortization service that our old, new, and likely any future servicing platforms all leverage. The fact that any of our apps can leverage these microservices makes them great building blocks for new solutions.
I touched on this above, but it’s worth repeating. When a service has no internal state, there is no laborious “set up” needed for any scenario — you just need to send the right request, and you get the same response each time. This is a powerful property when it comes to writing acceptance tests.
For example, with the Accountant, we crafted a set of golden scenarios, defined by our stakeholders and modeled on real loans, as a set of JSON input files. We generated an output CSV for each, mapped from the JSON, and had it validated by the accounting department; once accepted, we saved each pair into the repo as an automatic acceptance test. These tests run on every test run so if the results change, I know I broke something. And because no database is involved and the service is written in Go, running dozens of these tests takes well under a second.
Trade-offs and caveats
Like any approach, stateless services are no silver bullet. For one, there’s the added latency to consider. In our domain, we can tolerate this — many of our processes are back-end with no direct consumer impact, and we’re more concerned with throughput (side note: deploying these stateless services as AWS Lambdas gives us plenty of this) than latency. Still, we make sure to instrument these calls and monitor for anomalies or worrying trends.
Secondly, any time you make an HTTP call, you risk running into a network blip. This can be mitigated with retry logic, which for stateless services is guaranteed to come at no extra risk. However, for longer outages, you’ll still experience service disruption if the services can’t be reached. Moving to an asynchronous model, backed by a message queue, would mitigate this risk.
Lastly, there’s a complexity cost here. We think it’s cheaper than the cost of keeping the logic nested in our monoliths, but for smaller teams or different business, it may not be the right model. Consider if the rate of change or demand for re-use with a component is high enough to really warrant extracting it — don’t pull out a service “just because.”
Nothing I’ve proposed in this post is revolutionary. Deciding how to split up code amongst services is just a level above deciding how to split up code amongst classes or modules, and you already know from experience that pure functions are easier to reason about and refactor than state-carrying objects. Free yourself from the notion that every service needs to be full-stack, and it might just lead you to a more natural and scalable architecture.
And if you find this type of thing interesting and want to join us on our architectural journey, Enova is always hiring.