Monday 1 June 2009

The journey of of 1000 miles (part 2)

In the previous post, We looked briefly at the protocols that make the web work. A browser starts by making an HTTP request. This is resolved by DNS, sent over the network via routers to reach a destination which then produces a response message and sends that back.

Now we know how messages are sent around the web, what kind of things are we able to send? Well... Enter HYPERTEXT! (Wow!). The web was designed to send "Hypertext" around? But what is Hypertext? Essentially it's a body of text with an in place reference to another body of text. If you ever had one of those adventure game books as a child (perhaps giving away my long held nerdiness there) they work as a form of hypertext. "If you want to open the treasure chest, go to page 134, if not, go to page 14."

So, this is not necessarily a computery idea. In fact, the idea of hypertext is generally agreed as originating from an idea by an engineer with the best name in computing history... Vannevar Bush, way back in 1945. Whilst there are some very well know implementations of Hypertext (Adobe PDF and Microsoft's original Encarta) it was the creation of the web in 1992 that eventually led to wide scale adoption of hypertext.

The core language of the web, HTML (Hypertext Markup Language) provides a way of "Marking up" text to contain references to another body of text. It does this using a mechanism called a "hyperlink", which describes the reference to another piece of text.

For example...I write a document and say in it...

The Vancouver Sun's statement that "The 2011 movie TRON 2 could become the most expensive film ever made with a reported budget of £300 million dollars." has been debunked.

I can augment this information by pointing at references, without breaking the flow of the text. I do this using an HTML "element" (a label with angled brackets around it) to point at a "hypertext reference". HTML elements can contain "attributes" which are values associated with the elements.

So for example... The "A" (anchor) element has an "href" (hypertext reference) attribute.
The basic format of this is...

The Vancouver Sun's statement that "The 2011 movie TRON 2 could become the most expensive film ever made with a reported budget of £300 million dollars" has been <a href="http://www.slashfilm.com/2009/04/13/">debunked</a>

A web browser knows not to display the anchor element directly, but to underline the contained text and allow a user to "click" on it. The browser knows when the user clicks, this means GET the document specified in the HREF attribute".

This Request/Response messaging pattern is core not only to the web but to the underlying TCP protocol over which the web (HTTP) runs.

Unlike the internet protocols used for something like Skype, or RealPlayer,
for web a browser to receive any information, it must first ask for it.

EVERYTHING you do on the web is affected by this important architectural constraint and this is one of the most important factors affecting the nature of the web and the design of applications that run on it.

This system of documents, links, requests and responses provide the fundamental application plaform on which every single web application is built.

Interestingly, the bright sparks who designed HTTP decided that whilst HTML could provide a natural format for hypertext documents, HTTP should not REQUIRE documents to be in HTML format. They can in fact be in any format including many that you'd maybe not think of as even being documents... text, images, videos, data and files containing programming code. Not only can documents be "linked" they can also be "embedded" so for example...

Using HTML you can link to an image for example, using the IMG "tag" (which has a src "attribute"). Instead of forcing the end user to request the resource the browser will fetch this resource inline and place it inline where the tag is.

so this....

<img src="http://pbskids.kids.us/images/sub-square-barney.gif" />

...will render as...



GOSH!

Interesting Historic note: The ability to "embed" was actually opposed by Tim Berners Lee (inventor of the web) presumably because he'd been "barneyed". The IMG was a custom tag used only by Marc Andressens "Mosaic" browser - "the pre-pre-precursor to todays FireFox, and it's certainly my view that if the IMG had not been included in HTML you'd not be reading this now.

In addition to linking and embedding documents HTML has all sorts of tags for formatting and describing documents.
There are about 100 tags in version 4 of HTML (HTML5 will introduce about another 25-30) which enable all kinds of display, embedding and linking of document elements.

If you view the source code of this page, you will see a huge collection of tags. This is the "markup" used to describe how this page should be displayed.

Of course, complex designs, necessitate complex descriptions, so you'll witness some fairly complicated looking HTML out there in the wild. There is no fast way to learn all these tags, but conceptually, as long as you understand the concepts of documents, elements and attributes then this should be able to build on this.

What I've tried to do over the last 2 posts is to try and get everyone to a common definition of the web. So, we all now understand that the web "is made of"...


  • DNS - The Domain name system and protocol. Used to convert named servers "www.myserver.com" in to IP addresses 127.0.0.1

  • TCP/IP - The Transmission Control and Internet Protocol Suite. - Used to send messages around the network

  • HTTP - The Hypertext Transfer Protocol - Used to specify the purpose of a message sent over TCP/IP

  • HTML - Hypertext Markup Language - Used to define the messages sent.
The web is centred around the "Request/Response" messaging pattern and linking and embedding of resources.

Cool. So now we're all on the same page. Onwards and upwards! Only 998 miles to go!

No comments:

Post a Comment