Monday 18 May 2009

The journey of 1000 miles

Right. First things first. Before you can have any reasonable discussion about how best to design applications for the web/cloud, you have to define what the web/cloud actually is, so I'll start with the web, as that's how most things in the world now talk to each other.

Everyone knows what the web is. You type in www.google.com, and enter "lolcats". Your browser connects to Google’s servers, sends "lolcats" as a search query, and Google searches its database and sends you lots of stuff back about, well erm... "cats with unusual grammar". Simple enough. However, as you may suspect, getting a list of cats (or whatever) to a screen thousands of miles away is more complicated than it may at first appear.

To see how this works in practice, we'll start with the simple example, that of fetching my first blog post. The commonly understood model of the web is that you connect to a server and download the requested information. This is a useful abstraction, but not entirely accurate.

The reality is more complicated, and involves a number of layers. Each layer builds upon the one below it. Each passes messages to the next layer via progressively more abstract protocols. So, the basic process works like this...

An end user using a browser asks for a resource in the form of a web address - a URL (Uniform Resource Locator)
http://scalethis.blogspot.com/2009/05/hello-world.html

This URL specifies the protocol (http://), domain (scalethis.blogspot.com) and resource (/2009/05/hello-world.html) requested.

We can pack this information up in an HTTP (Hypertext Transfer Protocol) request message which looks like this...

GET /2009/05/hello-world.html HTTP/1.1

Your browser then needs to find a machine that is capable of dealing with this message. To do this, it uses another Internet system called DNS (Domain Name System) to translate the domain into an actual machine to send the message to. This works like a telephone directory lookup. DNS finds that the name "scalethis.blogspot.com" is associated with the actual IP (Internet Protocol) address, 209.85.227.191.

You can see how this works by using "ping" from your command line.

C:\>ping scalethis.blogspot.com
Pinging blogspot.l.google.com [209.85.227.191] with 32 bytes of data


Now the browser knows...
what we are looking for (/2009/05/hello-world.html)
from where (209.85.227.191)
and how to ask for it (http)

Now, unlike some other networks, the internet’s big trick is that - despite appearing as if you connect to a remote machine - the TCP/IP protocol suite is in reality "connectionless". Instead of connecting directly to the remote machine, it basically packages up your request in the form of a message and writes an address on it. "Please send this to 209.85.227.191". It then sends this on to its nearest router, which forwards it on to another router, and another... until it reaches its destination.

You can see how this works by using "tracert" from your command line:

C:\>tracert scalethis.blogspot.com

Tracing route to blogspot.l.google.com [209.85.227.191]
over a maximum of 30 hops:
1 10.0.0.1
2 195.224.48.153
3 195.224.185.40
4 62.72.142.5
5 62.72.137.9
6 62.72.139.118
7 209.85.255.175
8 66.249.95.170
9 72.14.236.191
10 209.85.243.101
11 209.85.227.191


Here you can see all the machines through which your message has passed before finally reaching 209.85.227.191, where scalethis.blogspot.com can be found.

The clever part of this is that if one of the machines in the middle is suddenly unavailable, by way of either nuclear war or coffee spillage, the previous router can simply send the message to another router and so navigate around the problem, much in the same way that your SatNav would re-route you around Birmingham at rush hour. All this business of finding the shortest path and routing around traffic blackspots is a bit of rocket-science handled by various routing protocols, but we'll save that for another day.

So now your message has reached 209.85.227.191! Hooray!

Now, what to do with it? Well the server knows it's an HTTP message, which is a good thing because 209.85.227.191 is a web server, and knows how to understand the message

GET /2009/05/hello-world.html HTTP/1.1

It can see that you're asking to "GET" /2009/05/hello-world.html. "GET" is only one of a number of HTTP "verbs", some of which I'll describe in my next post. For now we can package up a response in order to reply. The HTTP server knows that the resource "/2009/05/hello-world.html" is held physically on "F:\Users\Temp\Backup\PleaseDontDeleteThis\ScaleThis\2009\05\hello-world.html", loads it up, and sends it back using the same forwarding technique.

Your browser reads this message, which contains HTML - like text but with lots of angled brackets - and displays it to you in a nicely formatted way! Yippee!

And it does all of this within a second or two (unless you're using AOL ;-)).

I've deliberately avoided talking in any detail about the higher level languages of the web (HTML, XML, SOAP, CSS, ECMA/JavaScript etc.) and what I'd refer to as the "overweb" made up of plug-ins (Flash/Silverlight, Java Applets, RealPlayer etc.), as I'll be discussing these quite a lot in the future. So for now the key takeaways are...

  • The web is a massive global network which uses messages to send information between computers, using routers.
  • These messages are all in standard formats (protocols) so that any software or hardware that sticks to those standards can understand them.
  • There is a fair amount of communication required to co-ordinate delivery of these messages, so the internet can be slow compared to networks that require a direct connection, such as traditional telephone networks. However, this co-ordinated exchange means that the web in its nature is flexible, reliable, and highly resilient to change.

If you're a developer you should get an overall idea of the how the protocols work. You don't necessarily need to understand the syntax of the Syn/Ack handshake in TCP, but you should at least know what the major protocols are, what they are for, and have an understanding of how they work, at least at a Wikipedia level. Your starting point can be found here...

http://en.wikipedia.org/wiki/Internet_Protocol_Suite

but the ones of particular concern to the functioning of the web are DNS, TCP, IP and HTTP. Whilst not strictly part of the internet protocol suite, you should have a read up on routing protocols too, particularly the Border Gateway Protocol (BGP).

Next time, I'll take a closer look at HTTP and some basic HTML/XML, and that should give us a reasonable common frame of reference on which we can build.

Thursday 14 May 2009

Hello World

Welcome to my blog. There are plenty of software blogs around (according to one colleague, the worlds first write-only medium), so why add yet another one? Well, I have a specific target audience. The people within my own company... although, as I am talking about things that may be of interest to the wider community, I decided to post on the web.

I work as a developer for a relatively small software company who is starting to get to grips with the ideas of Web applications, Software as a Service and the currently trendy term "Cloud Computing".

This company is a Microsoft partner, selling primarily a client-server style Windows forms-based application. So even some of our most experienced developers don't really have much experience dealing with the more esoteric subjects involved in Web development. The environment, culture, architectural style, and languages used on the web are quite different from the (traditionally) more centrally planned monoculture of enterprise software.

Later this year our company will begin exploring what we're referring to internally as "vNext" which is (possibly) the next major version of our core product (possibly a new product or collection of products). The redpills amongst us know (although not all are necessarily all that comfortable with the fact) that this really needs to be a scalable web application (or at the very least an application that has web architecture at its' core). Those people not yet entirely convinced, have increasingly flimsy reasons for keeping our core product as necessarily a client/server style application.

My aim is that if our company is moving in this direction (as I believe it has to), as a team we all need to have a better understanding of the nature of the web and the services that run on it, not only from a technical perspective, but from business, economic and environmental ones too if our company is to thrive in the future.

Over the coming weeks/months, I will be writing about all kinds of fascinating topics :-) but with the main aim of clarifying some of the concepts, patterns, practices, languages, dialects, rituals and sacrifices involved in producing large scale web applications.

My overall plan is to start with a general background to the Web, then to look at the principles, general constraints and architecture of distributed applications. I can then start to look at specific design patterns and practices that can be adopted to create web scale services and applications.

I've not written a blog before so any feedback would be greatly appreciated. (Particularly if you fundamentally disagree with anything I'm saying!)

Thanks for reading,

Chris.