Scale This!: The state I am in

One of the fallacies of distributed computing states that "Network topology doesn't change."

Well, yes it does, particularly on the internet. Pieces of hardware and software are constantly being added, upgraded and removed. The network is in a constant state of flux. So how do you perform reliable operations in such a volatile environment?

Well, the answer lies in part with the idea of statelessness. HTTP is what’s known as a stateless protocol. For those coming from a non-distributed platform background, statelessness is one of the most misunderstood aspects of web development.

Partly this is due to attempts to hide statelessness by bolting on some - often clunky - state management system. This is because statelessness is often viewed as a kind of deficiency. Framework developers all over the world say, "Oh yuk. Statelessness. This is soOOo difficult to work with. I want my developers to be able to put a button on a form and say OnClick => SayHello(). It's 2009! I don't want to have to worry about state issues on a button click!".

If you're dealing with a Windows Forms style of application, your button is more or less the same “object" that is shown on the user’s screen; but in many web frameworks, a "button" and its on screen representation in a user’s browser may be thousands of miles apart. The abstraction of “a button” assumes a single object, with a state - but this is not the reality. What appears to be one button is in fact two buttons: a logical one on a server, and a UI representation in a browser, with a “click” message being sent between them. The concept of "a button" is a leaky abstraction, and the reason that it's leaky, is because it does not take in to account the network volatility that is a standard feature of the web.

So, what do I mean by statelessness?

When you use the web, it enlists the resources required to fulfil your request dynamically. Once the request has been completed, the various physical components that took part are disconnected, and neither know nor care anything for each other anymore. How sad :-(

Let's look at it in terms of a conversation,

Customer: Hello
Teller: Hello, what can I do for you today?
Customer: I'd like to know my bank balance please.
Teller: Can I have your account number please?
Customer: It's 09-F9-11-02-9D-74-E3-5B-D8-41-56-C5-63-56-88-C0 ;-)
Teller: Well. Your balance is only $1,000,000.
Customer: Ok, Can I please transfer all of it to my Swiss bank account please?
Teller: Yeah sure, what's the account number?
Customer: 12345678.
Teller: Ok, I'll just do that now for you. Would you like a receipt?
Customer: Yes please.
Teller: Here you go => Receipt Provided.
Customer: Bye.
Teller: Bye.

This conversation contains state. Each role of Customer and Teller must know the context of the messages being transferred.

The stateless version of this would go more like...

Customer: Hello, I'd like to know the Balance for account 09-F9-11-02-9D-74-E3-5B-D8-41-56-C5-63-56-88-C0,
Teller: It's $1,000,000

Customer: Could you please transfer $1,000,000 from account 09-F9-11-02-9D-74-E3-5B-D8-41-56-C5-63-56-88-C0 to Swiss bank account number 12345678 and provide a receipt please.
Teller: Yup. Here you go => Receipt Provided.

The end results are the same. However in the first example, "fine-grained operations" are used. The Customer and the Teller need to know where they are in the conversation to understand the context of each statement. In the second, each individual part of the conversation contains all of the information required to perform the requested task.

If we think of these as client and server message exchanges, then in the second example it doesn't matter if the server (teller) is restarted, reinstalled or is blown-up and replaced between the two requests. This is invisible to the end user. You would never be aware of the swap.

However, in the first example, if the server was restarted it would "forget" where you were in the conversation.

Customer: Yes please.
Teller: Sorry? What?

...or in pseudocode...

Client: Yes please.
Server: Sorry? What? WTF? Ka-Boom!

This is a fundamental difference between programming in a “stateful” way, and distributed programming.

Understanding statelessness, its implications and how to manage state in a stateless environment, is a core skill of a web developer. Without understanding the implications of various state management systems, you simply cannot develop large scale applications. In web application there is ALWAYS a client, and a server, and a potentially lengthy request/response message exchange between them.

Most programmers have a fairly firm grip on Object-Oriented programming, and one of the keys to this is the packaging of state and behaviour. As a result, they get locked in to a system of thinking that says state and behaviour should always go together, hence the popularity of technologies such as CORBA, DCOM, and Remoting. These technologies make it easy for people who are used to dealing with "objects" to deal with these without worrying directly about the messaging process that goes on behind the scenes.

However, you often hear complaints about the scalability of these types of remote object technologies. The reason for this often involves assumptions about state. Since remoting interfaces are often automatically generated by whatever remoting framework you are using, and the objects used are very often designed deliberately in such a way as to be re-usable (although not necessarily performant) in both distributed and non-distributed environments, you end up with Chatty, context bound, fine-grained interfaces, which have a hard time keeping up when you have a large number of conversations going on at once.

Ceteris paribus, if you imagine the two conversations above in a remote system where each message exchange took a second, the first would take 7 seconds, the second 2.

What's also interesting is that the two message exchanges in the second conversation can be made either way round or, if required, in parallel. This of course works only if you are able to make some assumptions regarding the availability of cash to you, for example, that you have an overdraft facility of $1,000,000 which you are currently not using. I will come back to this when I discuss Concurrency and Data Consistency, but it is perfectly reasonable to run these operations in parallel. This, along with the rapid growth in multi-core and cloud computing, is one reason for the recent resurgence of interest in functional/stateless programming languages such as Erlang and F#.

The important thing to remember about statelessness is that it is all about maintaining a high degree of disposability. You should be able to throw everything away between message exchanges without a user noticing a server restart. Doing this will enable your application to handle server restarts and changes in network topology gracefully.

Next up: Atomicity.

Thursday, 2 July 2009

The state I am in

No comments:

Post a Comment