Embracing the Hybrid Approach: Our Microservice Communication Layer Journey at Paxos

Products

Assets

Company

Transparency

Developers

Products

Assets

Company

Transparency

Developers

< Blog Home

Andreas Rayo Kniep

Aug 16, 2023

The challenge

We want to decide what communication layer to use for our new microservice infrastructure.

We are the Blockchain Team at Paxos. As the Blockchain Team, we are responsible for the interaction of Paxos business processes with the various blockchains supported by Paxos. For example: If a Paxos customer wants to withdraw crypto from their Paxos account, we have to create, sign and broadcast the transaction on the blockchain that the customer wants to withdraw on.

Our internal clients are those services that Paxos customers interact with like Paxos’ RESTful API or our website frontend. But also internal services that deal with balancing of wallets, and minting and burning of stablecoins rely on us to realize movements on the blockchain.

Let’s take a step back in time to understand how we found ourselves in the situation of deciding on a communication layer for microservices.

Starting simple: The monolith

In the past few years at Paxos, we found ourselves in the typical fast-paced environment of any burgeoning tech company. We swiftly rolled out services, accommodated customer requests, and continuously adapted to ever-changing market conditions.

Like many early-stage software companies, we initially developed our software application with a monolithic structure. A well-designed monolith offers several advantages that were critical during our initial build phase. It was sufficient to get us off the ground, enabled simpler deployment, required fewer resources for development and maintenance, and proved to be a cost-effective strategy for rapid launch.

One of the defining traits of monolithic structures is the high degree of control they afford over application logic. Coupled with their self-contained nature, this makes them easier to debug and understand — an invaluable advantage, especially for smaller teams with limited resources or constrained timelines. The simplicity and control offered by monoliths made them the ideal choice for us as we launched our services.

Maturing applications

With our products growing and achieving success, our software now serves millions of end-users. As such, the maturity of our software architecture has become equally vital. Our journey began in a self-contained system, which afforded us the agility to adapt and implement changes rapidly. However, this system has transformed over time into one with tightly coupled components. Major pivots to our business processes are less frequent in our current stage, but the need to scale and adapt to changing technology and usage requirements is more crucial than ever.

As our software engineering team expanded, the limitations of our monolithic structure started to surface. The very advantages that fueled our speedy progress in the early days were now presenting challenges. For instance, the monolith made it hard to onboard new engineers effectively, scaling distinct parts of our application in isolation, and updating technologies in separate areas of our application, among other things. The necessity for a structural shift became increasingly evident.

Microservices to the rescue

Microservices have become a popular architectural design strategy. Developers base this paradigm on a system where components are decoupled and exist as isolated services. This decoupling presents a multitude of advantages:

Independent development: Engineers can work on different services independently without impacting other components. New hires can create value much faster, and several teams can collaborate on different parts of the same application without stepping on each other's toes.
Scalability: Microservices are inherently scalable. As some aspects of our business need to scale up, we can selectively scale those services in isolation.
Deployment independence: Each microservice can be updated and deployed independently, allowing for zero-downtime deployments and continuous delivery.
Resilience: Major changes or failures in one component won't affect the others, ensuring the overall system's continuity and robustness. By eliminating single points of failure, we create a system where one malfunctioning module does not compromise the entire system.
Technological freedom: Individual services aren't limited to specific tools, technologies, or even programming languages. This flexibility allows us to tackle technology-related challenges using the best tools for the job, enhancing our problem-solving efficiency.

It's not all smooth sailing

So, microservices will solve everything, right!? By simply decoupling core services from the monolith, it seems that all our problems will be resolved — at least, that's the initial thought.

However, despite its promising advantages, transitioning to a microservices architecture is not without hurdles and must be implemented thoughtfully and correctly to avoid introducing new complexities.

One significant challenge lies in designing effective inter-service communication within this decoupled environment. Here, we want to look at this challenge and how we approach it at Paxos.

In a monolithic structure, components communicate through direct method invocation. This is a simple and effective way when all components reside in the same codebase. However, in a microservice environment, things are more complex. We need an entirely different approach to facilitate communication between our isolated services.

As the microservices design pattern evolves, we're spoilt for choice when it comes to technology options for inter-service communication: gRPC, RabbitMQ, Kafka, Temporal, to name a few.

While these technologies are powerful tools in the right hands, they also introduce another layer of complexity. Each choice has pros and cons, and we must make this selection thoughtfully while keeping our unique business needs and existing architecture in mind. This inter-service communication will serve as the lifeblood of our system, ensuring that our various microservices can interact and function as a cohesive unit.

For user-facing requests, we use gRPC for inter-service communication. However, for less latency-sensitive processes, we have a choice of strategies.

Choreography vs. Orchestration

Orchestration and choreography represent two high-level strategies for managing interactions in a microservice architecture. Both strategies address how business processes execute over a network of distinct microservices.

Choreography: Event-driven interaction

Choreography is an event-driven architecture where each microservice independently understands its responsibilities.

Microservices consume, respond to, and potentially generate messages across various message queues; there is no central place tracking subscriptions and dependencies. A message broker, such as RabbitMQ, or an event-streaming platform, like Kafka, typically manages these queues.

Choreography works best in environments where processes and their intermediate steps:

operate stateless;
rely exclusively on the input message without needing additional context from the overarching business process;
mainly transform, forward, or persist data from input messages;
follow one another clearly (e.g., Directed Acyclic Graphs, streaming);
progress in one direction (i.e., retry until complete);
benefit from decentralization for increased flexibility, such as scaling up or modifying individual steps of the process in isolation, without needing to touch a central orchestrator.

Note, however, that choreography can make it challenging to trace, debug, or monitor the processes triggered by an event, particularly when the event stream is nonlinear. This becomes particularly evident when processes have to be reverted or rolled back.

Orchestration: Command-driven interaction

Orchestration is a command-driven architecture guided by a central controller or "orchestrator."

This orchestrator executes business processes along predefined workflows and determines the control flow of services, managing their interaction. A workflow orchestration framework like Temporal interacts directly with all services needed to complete a workflow and stores the workflow’s state in a central database.

Orchestration excels when processes and their intermediate steps:

are stateful;
require waiting for the completion of intermediate steps, such as credit card approval or transaction confirmation;
can lead to a conditional choice of subsequent actions;
must be carried out atomically, that is, entirely or not at all (akin to database transactions);
need to be centralized in one place for visibility.

Note, however, that workflow management comes at the cost that a central database managed by the orchestration framework handles all state related to the workflow.

Strategy risks

Chaos: If our system leans too heavily towards choreography, it risks “descending into chaos” as tracing processes triggered by an event can become nearly impossible (akin to a "Pinball Machine Architecture").
Monolith: On the flip side, if our system overemphasizes orchestration, it risks reverting to a monolithic structure, as all business logic gets squeezed into a single centralized orchestrator.

What is Pinball Machine Architecture?

The term “Pinball Machine Architecture” is colloquially used to describe an EDA with heaps of side-effects and inter-dependencies where processes are hard to trace, debug, and update when requirements change.1

1 Tamimi Ahmad, Event-Driven Architecture Myth Busting — Part 2: Five More EDA Claims, 2021

Striking a balance between choreography and orchestration is difficult, e.g. excessive orchestration can lead to architectural chaos.

Which strategy to choose? It depends!

The suitable strategy for our microservice architecture depends on its nature. For reacting to events with varying responses, choreography is a great fit. For well-defined, stateful business processes, orchestration is more suitable.

As a result, ideal designs should not rely too heavily on either strategy. Instead, they should incorporate elements from both patterns. For example, we can break down an application into multiple orchestrated flows that are interconnected via choreographed components.

Paxos system requirements

Introducing a message-oriented middleware is more than just a technology choice, as it needs to meet the requirements of different aspects of our system. A one-size-fits-all solution may not exist.

Well-defined business processes

On the one hand, our system handles well-defined business processes, such as withdrawal requests, wallet balancing and stablecoin issuance. These requests trigger workflows consisting of well-defined steps that our services mostly execute sequentially.

Moreover, we need the capability to effectively track request fulfillment, enabling us to communicate their status accurately to our internal clients.

Example: Bitcoin Withdrawal Request (Simplified)

Consider a situation where an end-user requests a Bitcoin withdrawal via the Paxos API. This action triggers a workflow that includes creating and signing transaction(s), broadcasting and monitoring transaction status and completing the withdrawal request upon receiving enough confirmations.

To communicate the withdrawal request status to the end user, a high-level, ideally centralized, process must oversee the workflow's status. This allows for real-time updates regarding the overall status, transaction hash(es) or confirmations. It also enables the system to respond to certain situations, such as when a transaction isn’t included in the blockchain due to low transaction fees.

Blockchain state changes

On the other hand, our system processes event-driven updates to the blockchain ledger. The addition of new blocks to the blockchain, which might replace previously added blocks (reorgs), triggers multiple execution paths. These paths parse and transform the original event data (the newly added block) into data updates relevant to various blockchain services.

Our focus is on something other than monitoring the status of individual updates to the blockchain ledger as they propagate through our data pipeline. Instead, we must appropriately reflect all changes to the blockchain state in the necessary updates to Paxos data stores.

What is a blockchain state?

Refers to the overall state of a blockchain, including information on the balances of all accounts, the state of all smart contracts, and other relevant data. It represents a snapshot of the blockchain at a specific point in time and is maintained by all nodes in the P2P network. The state of the blockchain is typically updated when a new block is added to the chain or a reorg occurs.

Example: new block mined on Bitcoin (Simplified)

When a new block is mined, it can initiate several processes involving various isolated services. These processes include a “ReOrg Detector” to identify any replaced blocks, a “Block Parser” to parse transactions and transfers from the new block, an “Address Filter” to filter relevant transfers for Paxos, a ”UTXO Manager” to update UTXO sets, a ”Transfer Monitor” to update transfer statuses and a ”Balance Service” to update the balances of wallets (collections of addresses) relevant to Paxos.

These transformations and updates can occur in parallel and are independent. The actual information flow is flexible as long as all updates are propagated throughout the system.

Paxos solution: Embracing a hybrid model

After carefully analyzing our requirements, we've adopted a hybrid approach for our communication layer.

1. Orchestration for internal requests

Paxos business processes that rely on Blockchain interaction, such as withdrawal requests, wallet balancing, and stablecoin mining, fit well with a command-driven architecture.

We can clearly define every step of the process, including conditional paths depending on the state of the process. For example, we may need to repeat the transaction creation step if a transaction isn’t included in the blockchain due to low transaction fees (RBF).

For these well-defined business processes triggered by internal requests, we decided to use workflow management with Temporal. The central orchestrator can track information, such as request identifiers and transaction hashes, enabling us to easily track requests throughout the workflow lifecycle. This allows us to report the state of the original request to our customers or respond to similar requests.

Using Temporal also eliminates the need for maintaining any complicated state machine, as Temporal manages the state of workflows and the communication to microservices under the hood. This reduces the complexity associated with implementing our business processes in a distributed system.

2. Choreography for blockchain state changes

For processes that update our data stores in response to changes to the blockchain ledger, we decided to use event-driven communication via Kafka.

External events, such as the addition of blocks to the blockchain, trigger these processes and flow in one direction like a data pipeline. Services parsing, transforming, and filtering blockchain state changes follow one another, and at each step, no context of the overarching process is needed.

Using Kafka provides us with maximum flexibility to introduce additional future subscribers and to replay events to reconcile any inconsistencies between Paxos data stores and the blockchain state. This eliminates the need for centralizing process logic.

We have chosen Kafka because of its ability to scale with high-throughput systems, and we plan to use its “message replay” feature.

3. RPC for service implementation

We have implemented our microservices as RPC services to have the flexibility to adapt to future strategy changes, such as introducing additional workflows orchestrated by Temporal. The central orchestrator can directly interact with the services required for a specific workflow by designing our microservices as RPC services.

For services interacting with our event stream, it is important to incorporate an adapter that can translate events into RPC calls and vice versa.

This setup ensures that our architecture is future-ready and can adapt to evolving requirements.

Conclusion

The Paxos Blockchain Team has adopted a hybrid approach to inter-service communication in our microservices architecture to meet our system requirements. For well-defined internal business processes, we leverage workflow management via Temporal. This ensures effective tracking and state management while avoiding a system with unclear side-effects. Blockchain state changes employ a pipeline-style approach via Kafka for flexible, event-driven communication.

By implementing our microservices with gRPC interfaces, we ensure the adaptability to meet future strategic shifts. This architecture optimizes our system's resilience, scalability, and technology freedom, setting us up for future growth.