Scale This!: "It Worked."

In my last post I briefly discussed statelessness. The key point about statelessness is that between HTTP calls, whether to fetch web pages, or to access web services or other HTTP based services, each part of the system should be capable of being restarted without causing the others to fail.

Ideally, services should require no state. There are ways of maintaining a "session state" in many web platforms, but these should very much be seen as an optimisation or workaround, rather than a first port of call. I will look at the various session state patterns.

One side effect of statelessness is that we do not really have the context of a conversation available to us, so all the state required to complete a transaction must be available during a single request/response exchange and have no dependencies outside this. This characteristic is referred to as "Atomicity", and normally produces a particularly interesting expression on the face a developers when realised for the first time. Usually it's a result of attempting to use a component that has adopted what I would call the "Barefaced Liar" pattern - all seemed fine when the code ran locally.

This is where a component is designed to expose a simple interface, common to UI development, but actually impossible to guarantee in a distributed environment. There are lots of examples of this but often it's a Distributed Transaction Coordinator of some sort - although session state providers, and WCF’s "Bi-Directional" channel, can cause similar nauseating effects.

So... we have no state available here in which to hold stateful conversations.

The mother of all conversations, in Software Circles at least, is the "2-Phase commit". This protocol ensures that when I ask for work to be done by say three co-operating participants, then either all of it is done, or else none is. The canonical example of this is the bank transaction. We want money to be taken out of account one and put in to another, but if one of those actions fails, we don't want to revert just that one action and pretend nothing untoward has happened. We don't want both (or neither!) of the accounts to have the money.

It essentially works like this. A coordinator asks each participant to do work and waits until it has received a confirmation reply from all participants.

The participants do their work up to the point where they will be asked to commit to it. Each remembers how to undo what they've done so far. They then reply with an agreement to commit their work if the work was done, or a "failed" message if they could not perform the work.

If the coordinator received an agreement message from all participants, then it sends a commit message to them all, asking them to commit their work. On the other hand, if the coordinator received a fail message from any of the participants, then it sends a "rollback" message to all, asking them to undo their work.

Then, it waits again for responses back from all participants, confirming they've committed or rolled back. Only when all have confirmed successful committal, does the co-ordinator inform whoever is interested, that the transaction has succeeded.

This is a staple pattern of Client Server Systems, particularly of the RDBMS kind. However, it is not guaranteed to work in a stateless system as conversations require context and so are not stateless. This breaks our previous advice about maintaining disposability. You cannot invisibly dispose of a participant in a transaction who is waiting to commit with no side effects.

So... Operations exposed on the web should all be Atomic. They should fail or pass as a single unit.

To a client/server developer this of course seems at first sight to be monumentally impractical in many real world scenarios. But in fact most activities of any real-world value cannot use transactions as we know them.

In a standard web services integration, the BookFlight(), BookHotel() and BookCar() methods in a BookHoliday() operation may well be run by completely different organisations. Each of these organisations are likely to have differing policies and data platforms.

I've always found it funny that the poster boy for the 2-phase commit, The Bank Account Transfer, in reality cannot - and does not - use the 2-phase commit protocol.

So what does it use? Well the answer lies in a fairly pungent concoction of asynchronous co-ordination, reconciliation and compensation that is definitely an acquired taste. I will come back to look at specific ingredients you can use in a later post but for now, the important thing to remember is...

At the service boundary, operations should fail or pass in their entirety.

Next up: Concurrency

Monday, 13 July 2009

"It Worked."

No comments:

Post a Comment