Scale This!: web

Showing posts with label web. Show all posts

Monday, 18 January 2010

jQuery for you and me. Ab-sol-u-tiv-e-leeee.

Everyone who knows me knows why I'm a fan of jQuery (clear separation of concerns and CSS-style selectors), but jQuery 1.4 builds on one of the best javascript libraries to make it even better.

The new release not only brings with it the standard maintanence type stuff such as better performace and a couple of new functions, but some good enhancements across the library.

You can now bind multiple event handlers at once, so instead of chaining a load of binding methods together, you can pop them all into the same call, like so:

$('#myClass').bind({
mouseover: function() {
// do something on a mouseover
},
mouseout: function() {
// do something on a mouseout
}
});

The new.detach() method allows you to remove elements from the DOM. It works exactly the same way as the.remove() method but unlike.remove() it doesn't destroy any data held by jQuery on that element (including any event handlers added by jQuery).

This can be useful when you want to remove an element, but you know you'll need it again later on, so you can write code like:

var myClass = $('#myClass');

// Bind an event handler
myClass.click(function(){
alert('Hello World!');
});

myClass.detach(); // Remove it from the DOM

// … do stuff here as if the item no longer exists.

myClass.appendTo('body'); // Bolt it back on to the DOM
myClass.click(); // alerts "Hello World!"

As far as DOM traversal goes, there is a new .unwrap() method which does the opposite as 1.3's .wrap() method, you can now query the DOM to check if an element .has() sub elements and also there is a new .until() method for limiting traversal of the DOM tree.

There's a new .delay() function for the queue, so you can get your animations the way you want them and you can now deal with JSON encoded attributes, so you can build up jQuery objects without needing to go through the process of adding .attr() to everything.

$('<div>', {
id: 'myClass',
css: {
fontSize: 20,
color: 'red'
},
click: function(){
alert('myClass has been clicked!');
}
}).text('Click Me!');

Last (and probably least) there are new .focusIn() and .focusOut() events which allow you to take action when an element, or a descendant of an element, gains or loses focus, so writing watermarked text boxes should be (only fractionally) easier :-)

For further information you can, RTFM here! :-). Also there is an 14 day long online event going on here that will get you up to speed with the changes.

Yum.

Monday, 29 June 2009

If you're not confused, you're not paying attention

In the last post we looked at the myths of infinite latency and bandwidth. I'd like quickly to cover some of the other fallacies before coming back to these, and how they result in some fundamental architectural constraints for distributed systems.

So let’s quickly check off a couple of these fallacies and their solutions.

The network is secure - It's not. The internet is a public network. Anything sent over it by default is equivalent to writing postcards. You wouldn't write your bank account details on a post card. Don't send them openly over the web.

Solution: Secure Sockets Layer (SSL), the famous “padlock” icon in your browser. Now my postcard is locked in a postcard sized safe. It can still go missing, but no-one can open it without the VERY long secret combination.

There is one administrator. - There is not. Lots of people, all speaking different languages, with different skills, running different platforms with entirely different needs, requirements and implementations. The people deploying code to the web are often entirely separate from those responsible for keeping that code running, often these groups have nothing to do with each other.

Solution: When designing software for this type of environment, you should make absolutely no assumptions about access to or availability of particular resources. Always write code using a "lowest common denominator" approach. Try to ensure that when your system must fail or degrade, it does so gracefully. I will come back to this in a future post.

These problems resulting from the heterogeneity of distributed systems are mostly solved or worked around by the adoption of the various web standards. The point of the standards is that messages can be exchanged without the requirement for a specific platform, or all the baggage that goes along with that.

Right... take a deep breath. Some of the jargon used in the next couple of paragraphs may cause eye/brain-strain, but do not fear! All will become clear, hopefully in 6 posts’ time. The next 3 posts will cover what I would regard as the most important considerations in building a large web based system, and are directly related to the remaining fallacies.

Fallacy	Consideration/Constraint
Network Topology doesn't change.	Statelessness
Transport cost is zero.	Atomic/Transactionless Interfaces
The network is reliable.	Idempotence (yes. really!)

Following that, I’ll discuss some of characteristics and issues that emerge from systems built within these constraints. Specifically, Concurrency, Horizontal Scalability and Distributed Data Consistency (or inconsistency, as more accurately describes the issue).

So next time... Statelessness.

Friday, 5 June 2009

The limits of my language are the limits of my world

In my last couple of posts, I described the web and how it is built. The reason for this is to introduce the web to non-web developers - primarily Windows ones, as Windows developers "see" the world differently from web developers. This is as a result of the underlying operating environment.

I am about to make some massive generalisations for the purposes of illustrative clarity.

Windows developers tend to see things in terms of "objects", "containers" and "references" which all communicate via "events" that are raised and responded to. Web developers tend to see things in terms of "resources", "relations" and "links" which all communicate via "messages" that are sent and received.

Windows developers "create new objects" where web developers "post resources". This may all be argued to be a case of "You say tomato", but they represent fundamental differences between traditional Windows development and web development.

An "event" is an abstraction that performs poorly on the web. A windows developer new to web development will see an event in a way that is appropriate to windows development, but if you deal with events in a web application the same way you deal with events in a windows application, you will run in to trouble. However, If you see don't see an event, but a pair of messages, you can store, edit, copy, cache and queue a message. If you think of things in terms of events, it's difficult conceptually to get your head around the idea of a "cached event".

This mismatch of understanding between non-distributed and distributed development results in the famous "Fallacies of Network Computing".

These fallacies are a list of flawed assumptions made by programmers when first developing web, or in fact any distributed, applications.

They are...

1. The network is reliable.
2. Latency is zero.
3. Bandwidth is infinite.
4. The network is secure.
5. Topology doesn't change.
6. There is one administrator.
7. Transport cost is zero.
8. The network is homogeneous.

Over the next few posts I will explain these incorrect assumptions, the problems they have caused (and continue to cause) for people new to web development, and the considerations, patterns and constructs that should be taken in to account to work around or overcome them.

Monday, 1 June 2009

The journey of of 1000 miles (part 2)

In the previous post, We looked briefly at the protocols that make the web work. A browser starts by making an HTTP request. This is resolved by DNS, sent over the network via routers to reach a destination which then produces a response message and sends that back.

Now we know how messages are sent around the web, what kind of things are we able to send? Well... Enter HYPERTEXT! (Wow!). The web was designed to send "Hypertext" around? But what is Hypertext? Essentially it's a body of text with an in place reference to another body of text. If you ever had one of those adventure game books as a child (perhaps giving away my long held nerdiness there) they work as a form of hypertext. "If you want to open the treasure chest, go to page 134, if not, go to page 14."

So, this is not necessarily a computery idea. In fact, the idea of hypertext is generally agreed as originating from an idea by an engineer with the best name in computing history... Vannevar Bush, way back in 1945. Whilst there are some very well know implementations of Hypertext (Adobe PDF and Microsoft's original Encarta) it was the creation of the web in 1992 that eventually led to wide scale adoption of hypertext.

The core language of the web, HTML (Hypertext Markup Language) provides a way of "Marking up" text to contain references to another body of text. It does this using a mechanism called a "hyperlink", which describes the reference to another piece of text.

For example...I write a document and say in it...

The Vancouver Sun's statement that "The 2011 movie TRON 2 could become the most expensive film ever made with a reported budget of £300 million dollars." has been debunked.

I can augment this information by pointing at references, without breaking the flow of the text. I do this using an HTML "element" (a label with angled brackets around it) to point at a "hypertext reference". HTML elements can contain "attributes" which are values associated with the elements.

So for example... The "A" (anchor) element has an "href" (hypertext reference) attribute.
The basic format of this is...

The Vancouver Sun's statement that "The 2011 movie TRON 2 could become the most expensive film ever made with a reported budget of £300 million dollars" has been <a href="http://www.slashfilm.com/2009/04/13/">debunked</a>

A web browser knows not to display the anchor element directly, but to underline the contained text and allow a user to "click" on it. The browser knows when the user clicks, this means GET the document specified in the HREF attribute".

This Request/Response messaging pattern is core not only to the web but to the underlying TCP protocol over which the web (HTTP) runs.

Unlike the internet protocols used for something like Skype, or RealPlayer,
for web a browser to receive any information, it must first ask for it.

EVERYTHING you do on the web is affected by this important architectural constraint and this is one of the most important factors affecting the nature of the web and the design of applications that run on it.

This system of documents, links, requests and responses provide the fundamental application plaform on which every single web application is built.

Interestingly, the bright sparks who designed HTTP decided that whilst HTML could provide a natural format for hypertext documents, HTTP should not REQUIRE documents to be in HTML format. They can in fact be in any format including many that you'd maybe not think of as even being documents... text, images, videos, data and files containing programming code. Not only can documents be "linked" they can also be "embedded" so for example...

Using HTML you can link to an image for example, using the IMG "tag" (which has a src "attribute"). Instead of forcing the end user to request the resource the browser will fetch this resource inline and place it inline where the tag is.

so this....

<img src="http://pbskids.kids.us/images/sub-square-barney.gif" />

...will render as...

GOSH!

Interesting Historic note: The ability to "embed" was actually opposed by Tim Berners Lee (inventor of the web) presumably because he'd been "barneyed". The IMG was a custom tag used only by Marc Andressens "Mosaic" browser - "the pre-pre-precursor to todays FireFox, and it's certainly my view that if the IMG had not been included in HTML you'd not be reading this now.

In addition to linking and embedding documents HTML has all sorts of tags for formatting and describing documents.
There are about 100 tags in version 4 of HTML (HTML5 will introduce about another 25-30) which enable all kinds of display, embedding and linking of document elements.

If you view the source code of this page, you will see a huge collection of tags. This is the "markup" used to describe how this page should be displayed.

Of course, complex designs, necessitate complex descriptions, so you'll witness some fairly complicated looking HTML out there in the wild. There is no fast way to learn all these tags, but conceptually, as long as you understand the concepts of documents, elements and attributes then this should be able to build on this.

What I've tried to do over the last 2 posts is to try and get everyone to a common definition of the web. So, we all now understand that the web "is made of"...

DNS - The Domain name system and protocol. Used to convert named servers "www.myserver.com" in to IP addresses 127.0.0.1
TCP/IP - The Transmission Control and Internet Protocol Suite. - Used to send messages around the network
HTTP - The Hypertext Transfer Protocol - Used to specify the purpose of a message sent over TCP/IP
HTML - Hypertext Markup Language - Used to define the messages sent.

The web is centred around the "Request/Response" messaging pattern and linking and embedding of resources.

Cool. So now we're all on the same page. Onwards and upwards! Only 998 miles to go!

Monday, 18 May 2009

The journey of 1000 miles

Right. First things first. Before you can have any reasonable discussion about how best to design applications for the web/cloud, you have to define what the web/cloud actually is, so I'll start with the web, as that's how most things in the world now talk to each other.

Everyone knows what the web is. You type in www.google.com, and enter "lolcats". Your browser connects to Google’s servers, sends "lolcats" as a search query, and Google searches its database and sends you lots of stuff back about, well erm... "cats with unusual grammar". Simple enough. However, as you may suspect, getting a list of cats (or whatever) to a screen thousands of miles away is more complicated than it may at first appear.

To see how this works in practice, we'll start with the simple example, that of fetching my first blog post. The commonly understood model of the web is that you connect to a server and download the requested information. This is a useful abstraction, but not entirely accurate.

The reality is more complicated, and involves a number of layers. Each layer builds upon the one below it. Each passes messages to the next layer via progressively more abstract protocols. So, the basic process works like this...

An end user using a browser asks for a resource in the form of a web address - a URL (Uniform Resource Locator)
http://scalethis.blogspot.com/2009/05/hello-world.html

This URL specifies the protocol (http://), domain (scalethis.blogspot.com) and resource (/2009/05/hello-world.html) requested.

We can pack this information up in an HTTP (Hypertext Transfer Protocol) request message which looks like this...

GET /2009/05/hello-world.html HTTP/1.1

Your browser then needs to find a machine that is capable of dealing with this message. To do this, it uses another Internet system called DNS (Domain Name System) to translate the domain into an actual machine to send the message to. This works like a telephone directory lookup. DNS finds that the name "scalethis.blogspot.com" is associated with the actual IP (Internet Protocol) address, 209.85.227.191.

You can see how this works by using "ping" from your command line.

C:\>ping scalethis.blogspot.com
Pinging blogspot.l.google.com [209.85.227.191] with 32 bytes of data

Now the browser knows...
what we are looking for (/2009/05/hello-world.html)
from where (209.85.227.191)
and how to ask for it (http)

Now, unlike some other networks, the internet’s big trick is that - despite appearing as if you connect to a remote machine - the TCP/IP protocol suite is in reality "connectionless". Instead of connecting directly to the remote machine, it basically packages up your request in the form of a message and writes an address on it. "Please send this to 209.85.227.191". It then sends this on to its nearest router, which forwards it on to another router, and another... until it reaches its destination.

You can see how this works by using "tracert" from your command line:

C:\>tracert scalethis.blogspot.com

Tracing route to blogspot.l.google.com [209.85.227.191]
over a maximum of 30 hops:
1 10.0.0.1
2 195.224.48.153
3 195.224.185.40
4 62.72.142.5
5 62.72.137.9
6 62.72.139.118
7 209.85.255.175
8 66.249.95.170
9 72.14.236.191
10 209.85.243.101
11 209.85.227.191

Here you can see all the machines through which your message has passed before finally reaching 209.85.227.191, where scalethis.blogspot.com can be found.

The clever part of this is that if one of the machines in the middle is suddenly unavailable, by way of either nuclear war or coffee spillage, the previous router can simply send the message to another router and so navigate around the problem, much in the same way that your SatNav would re-route you around Birmingham at rush hour. All this business of finding the shortest path and routing around traffic blackspots is a bit of rocket-science handled by various routing protocols, but we'll save that for another day.

So now your message has reached 209.85.227.191! Hooray!

Now, what to do with it? Well the server knows it's an HTTP message, which is a good thing because 209.85.227.191 is a web server, and knows how to understand the message

GET /2009/05/hello-world.html HTTP/1.1

It can see that you're asking to "GET" /2009/05/hello-world.html. "GET" is only one of a number of HTTP "verbs", some of which I'll describe in my next post. For now we can package up a response in order to reply. The HTTP server knows that the resource "/2009/05/hello-world.html" is held physically on "F:\Users\Temp\Backup\PleaseDontDeleteThis\ScaleThis\2009\05\hello-world.html", loads it up, and sends it back using the same forwarding technique.

Your browser reads this message, which contains HTML - like text but with lots of angled brackets - and displays it to you in a nicely formatted way! Yippee!

And it does all of this within a second or two (unless you're using AOL ;-)).

I've deliberately avoided talking in any detail about the higher level languages of the web (HTML, XML, SOAP, CSS, ECMA/JavaScript etc.) and what I'd refer to as the "overweb" made up of plug-ins (Flash/Silverlight, Java Applets, RealPlayer etc.), as I'll be discussing these quite a lot in the future. So for now the key takeaways are...

The web is a massive global network which uses messages to send information between computers, using routers.
These messages are all in standard formats (protocols) so that any software or hardware that sticks to those standards can understand them.
There is a fair amount of communication required to co-ordinate delivery of these messages, so the internet can be slow compared to networks that require a direct connection, such as traditional telephone networks. However, this co-ordinated exchange means that the web in its nature is flexible, reliable, and highly resilient to change.

If you're a developer you should get an overall idea of the how the protocols work. You don't necessarily need to understand the syntax of the Syn/Ack handshake in TCP, but you should at least know what the major protocols are, what they are for, and have an understanding of how they work, at least at a Wikipedia level. Your starting point can be found here...

http://en.wikipedia.org/wiki/Internet_Protocol_Suite

but the ones of particular concern to the functioning of the web are DNS, TCP, IP and HTTP. Whilst not strictly part of the internet protocol suite, you should have a read up on routing protocols too, particularly the Border Gateway Protocol (BGP).

Next time, I'll take a closer look at HTTP and some basic HTML/XML, and that should give us a reasonable common frame of reference on which we can build.

Thursday, 14 May 2009

Hello World

Welcome to my blog. There are plenty of software blogs around (according to one colleague, the worlds first write-only medium), so why add yet another one? Well, I have a specific target audience. The people within my own company... although, as I am talking about things that may be of interest to the wider community, I decided to post on the web.

I work as a developer for a relatively small software company who is starting to get to grips with the ideas of Web applications, Software as a Service and the currently trendy term "Cloud Computing".

This company is a Microsoft partner, selling primarily a client-server style Windows forms-based application. So even some of our most experienced developers don't really have much experience dealing with the more esoteric subjects involved in Web development. The environment, culture, architectural style, and languages used on the web are quite different from the (traditionally) more centrally planned monoculture of enterprise software.

Later this year our company will begin exploring what we're referring to internally as "vNext" which is (possibly) the next major version of our core product (possibly a new product or collection of products). The redpills amongst us know (although not all are necessarily all that comfortable with the fact) that this really needs to be a scalable web application (or at the very least an application that has web architecture at its' core). Those people not yet entirely convinced, have increasingly flimsy reasons for keeping our core product as necessarily a client/server style application.

My aim is that if our company is moving in this direction (as I believe it has to), as a team we all need to have a better understanding of the nature of the web and the services that run on it, not only from a technical perspective, but from business, economic and environmental ones too if our company is to thrive in the future.

Over the coming weeks/months, I will be writing about all kinds of fascinating topics :-) but with the main aim of clarifying some of the concepts, patterns, practices, languages, dialects, rituals and sacrifices involved in producing large scale web applications.

My overall plan is to start with a general background to the Web, then to look at the principles, general constraints and architecture of distributed applications. I can then start to look at specific design patterns and practices that can be adopted to create web scale services and applications.

I've not written a blog before so any feedback would be greatly appreciated. (Particularly if you fundamentally disagree with anything I'm saying!)

Thanks for reading,

Chris.

About Me

Hi, My name is Chris and I have been working as a web developer for about 10 years, primarily working on the Microsoft plaform.

I have an interest in web standards as a member of the HTML Working Group at the W3C and have an active interest in large distributed systems and "Cloud Computing". I took part in the private beta of Amazon's SimpleDB service and I am currently experimenting with (the interesting parts of) Microsofts Azure platform preview.

I'm also re-discovering javascript/css via the brilliant jQuery and Sizzle selector engine.