Lezione 7 (+4): Supplement

A supplement for non-computer scientists reading Website Architecture Lezione 7: Clients & Servers and Lezione 4: Browsers.

These lessons should be pretty understandable for non-computer scientists; however, they're not as deeply relevant to your usual responsibilities as a web developer as they would be to a CS web developer. So you can read them in order to get a broad understanding of these layers and their roles in websites overall; you don't have to become very skilled at tweaking or tuning them. That being said, your reading about some of the more advanced concepts may serve you by helping to develop the basic/foundational picture more clearly, so you should probably read the whole lessons unless told otherwise (see below); you can read this document first.

Client/server model

This lesson introduces that such things as servers and the Internet exist. Most basically, a server is just a computer which is connected to the Internet, which has the website files (.html, .css, .js, etc.) on its hard drive and can give them to whoever asks for them. We saw a simple diagram like this in class:

Client-Server Model

The client is usually a human like you. Clients are represented by programs, such as web browsers, which are generally called user agents. A user agent may be a browser on your desktop, a browser on a laptop or mobile device, a browser on a video-game console, or even something like Google's page crawler which consumes and analyzes web pages. The server is just a computer which has a specialized purpose and thus may lack some of the typical features that home computers have (e.g. printer, mouse, monitor, and more). There is actually a program running on the server which listens for website requests, but we will see that more specifically in the next lesson; basically, the user agent talks to that program directly and sends it messages using an agreed-upon format. This format is called HTTP, and this lesson partly deals with the anatomy of an HTTP message.

Computer communication over the internet

A small section of the PDF talks about the basics of computer networking. You could read it out of curiosity, but you do not need to know any of it.

HTTP message format (part 1)

HTTP messages are mostly composed of simple text. A communication protocol is simply an agreed-upon message structure. Let's see some general examples of communication protocols, because they are excellent puzzles and build good problem-solving skills, even if you don't aspire to be a comptuer scientist or work with HTTP in any serious depth.

Communication protocols in general

For our example, let's say you're back in World War II and you want to use a telegraph to inform the base of the previous day's operations. You want to communicate how many submarines you observed in the area, how many you sunk, and how many torpedoes you have left. There are very many ways you could design a protocol for this. Let's see a few.

Assumed order, delimited by spaces

You agree that the number of observed subs is first, then the number of sunk subs, then the number of torpedoes. You simply write the numbers in order with spaces between them:

12 1 8

That's not too bad. Not very human-readable. But it is decently lean; you could at worst say that the two spaces are "wasted", so the overhead is 2/6 or ~33% in this case. Can we do better?

Assumed order, tightly packed

Let's get rid of the spaces:

1218

Whoops. Alright. Well, that's not very understandable; we might not be sure of where one number starts and another begins. This is a problem. (But hey, you could still probably deduce the right answer.. assume parts of the message A, B, and C. A ≥ B and B ≤ (most recent torpedo count - C). There are only a small number of possible "cuts" that would divide this message in three parts. Just try them and see if the numbers make sense. This kind of problem-solving strategy in general may not scale up very well, by the way.)

So what can we do about this? Let's assume the numbers are all less than 100, so we can assume two digits for each one and we'll put zeroes in there if necessary:

120108

This is actually the same size as our first example, though the first example would grow if the numbers went over 10 and this one wouldn't. So its benefit may have to be evaluated on a case-by-case basis with typical examples.

In any case, this kind of thing is great for computers but not so great for humans.

Assumed order, arbitrary message length and content, boundaries

Let's bend our example a little bit and assume you might want to write an arbitrary amount of information for each component of the message. We'll do this because it demonstrates the advantages and disadvantages of this type of protocol better than our super-small example above with the three numbers. So let's assume you want to send a message describing how many subs you saw and where, what kinds of torpedoes you have left, etc.

Can we still use a space to separate them? Of course not; a space might occur inside the actual phrases. Let's pick a "boundary" to appear between each segment, still assuming that we can agree upon the order of segments (subs observed, subs sunk, torpedoes left). Let's say the boundary is "ASDFASDFASDFASDF". Now let's see a few messages:

Saw two U-boats within 10 knots of our location, another 20 knots away.ASDFASDFASDFASDFSunk none.ASDFASDFASDFASDFTwo explosive torpedoes left, four standard, one damaged.
3ASDFASDFASDFASDF0ASDFASDFASDFASDF6

For larger message content, it may be alright. For small message content, it is clearly inappropriate. You will just have to evaluate its typical appropriateness for each given problem. Note also that if the boundary word occurs within the actual message content, the one inside the message should be escaped, similar to our examples in HTML.

Assumed order, arbitrary messages, boundaries of length info

Part of our problem with arbitrary message lengths is that we don't know when they're going to end, so we don't know when to divide the message into its components or perhaps when to turn off the machine and stop listening at the very end (we have not discussed this yet but it is a problem). How about this - we advertise how long the message is going to be:

16:Three last night4:None9:Seventeen

Well hey, that works pretty well. Hopefully you can see that the reader should not be confused that "4:" or "9:" are part of their preceding segments, because the reader already knows that the first segment will run out just before that "4:" starts, for example; the next character must be a segment delimiter. (And hey, maybe we should have made "9:" something like "9!:" to show that the message ends after this segment.. and then, why not "9!"?)

There are very few problems with a protocol like this. The one big drawback might be that we still have an assumed order for the message segments. This makes the protocol somewhat less versatile and readable, among other annoyances.

Arbitrary order, arbitrary content, using headers + end delimiters

If we want an arbitrary order of message segments, we probably need to advertise explicitly which segments they are. Let's see something like this:

SUNK:None
SAW:Three last night
TORP:Seventeen

We have added "headers", which are pieces of information which are given before the principal content of the message. They appear first and are thus at the "head" of the document or the head of its segment, hence their name. Our earlier example with the "16:" actually used headers, in a sense, in that those delimiters were preceding pieces of information which told the reader something about what to expect next.

Now, this is pretty good. But remember that may actually be transmitted like this:

SUNK:NONESAW:THREE LAST NIGHTTORP:SEVENTEEN

Let's put a boundary after each segment:

SUNK:None!!!
SAW:Three last night!!!
TORP:Seventeen

And we're pretty much it; we have reached our destination. This type of protocol is a pretty decent mix of flexibility, readability, and efficiency, and computer scientists have seen fit to use it in many actual protocols that are read & written by both humans and computers. In particular, this most closely resembles HTTP.

There are a huge number of ways to design a protocol, and protocols are always tailored to their application environments, sometimes heavily. We could have arrived at this conclusion through a different path, and we also could go much further.

HTTP message format (part 2)

There are two types of HTTP messages: requests, sent from clients to servers, and responses, sent from servers to clients. They look very similar. Here are two abstract examples and two more concrete examples (still not precisely realistic):

RequestResponse
(This is a request; I want to get something)
(zero or more headers here; one per line)
(blank line or end of message)
(payload/data if applicable)
(This is a response; info on whether successful)
(zero or more headers here; one per line)
(blank line or end of message)
(payload/data if applicable)
GET /images/august/fundraiser.jpg HTTP/1.1
BrowserID: Firefox/14.01
ExpectedResponseType: image/jpeg
HTTP/1.1 200 OK
ContentType: image/jpeg
ContentSize: 109822
CreationDate: 2012 Apr 11, 2:46 PM

actual JPEG raw data here

The only unrealistic parts of these concrete examples are the header names and values. They are pretty close, but the actual names are less obvious and explanatory. See the lesson PDF for realistic details.

Getting hands-on experience

The PDF mentions how you can fire up some hackish programs and write your own requests manually. You don't need to do that, especially if you only want a general awareness of requests. Instead of that "Getting started" section, you could just use the Net tab of Firebug to show you an excellent representation of requests & responses. If you don't see anything there, read the intructions to enable the Net tab and then refresh the page. Be sure to expand each request to see more details, click around on all the sub-tabs, and generally explore all you can and check things out.

Important uses of HTTP headers

There are many nuanced things you can do with HTTP headers, but if you're not a computer scientist or advanced programmer, you probably won't care. You should still learn about some of the most significant impacts that HTTP headers make, such as how they can be used to reduce internet traffic drastically and improve (apparent) download times of sites.

All of these most important uses of headers are covered in the lesson PDF, plus a little more detail. You could skip the discussion of transfer encoding (all but first paragraph of "Content length and.."). You don't really need to know about content disposition. Nor any details of gzip compression (just knowing the concept is nice). When you get to the discussion on caching, you can switch to a relevant part of Lezione 4 and then jump back to 7 when you're done..

Browsers: caching and efficiency

Lezione 4 deals with browsers, including the quirkyness of browsers (a very real but decreasingly serious problem) and how to use them more efficiently. We are concerned with some of the efficiency lessons here. Read the sections on caching, sprite images, and, more optionally, embedded media.

The browser cache

Simple preview: The browser cache is a repository of images and files from websites you've recently visited. The browser may use the old copies from your hard drive rather than go and get them over the Internet again, which is far faster and is less of a hassle for the server and everyone else on the Internet which would otherwise have to deal with that spurious network traffic. There are problems with this kind of feature, though, because you have to be reasonably sure that you have an up-to-date copy of the file that you chose not to retrieve. There are many solutions to this problem and most of them involve the client & server communicating hints and requests about caching via HTTP headers.

Practical notes for the curious.. The cache in this scenario is a feature of browsers, and browsers are responsible for managing these files on your hard drive. So, where do the files live? In a folder managed by the browser: perhaps something like C:\Users\Mike\AppData\Mozilla\Firefox\...\Cache\bakesale.jpg. The phrase "Temporary Internet Files" may also represent the cache. What about saved passwords and searches? Are they part of the cache? Not.. precisely. They represent a similar type of problem-solving strategy, in keeping local copies of things, and they are usually managed together in the user's preferences, but they are not exactly saved copies of website files, so they are not considered part of the proper cache.

Cost of requests

In general, requests are considered somewhat expensive, in that web designers try to minimize the sheer number of requests that are sent per each page load. Think of all the HTTP headers that are sent for a request and its response; and keep in mind that there are usually more headers than in these examples (about 1000 bytes for a request + response?). Those bytes are an overhead cost. Especially when the payload of each request is small, such as when transferring a large number of small image files, the overhead can become an unreasonably large percent of the total data transferred.

To that end, you may decide to make some artistic or design decisions, like using fewer images, or you may decide to employ the trick of "sprite images", where you combine multiple small pictures into one image file and use CSS to present only a cropped region of the file. The lesson gives a good example, and there is also an example in the associated zip file. The bottom line is that you can have one file (and thus one HTTP request) for something like eight emoticon images, rather than eight small files & eight requests.

Using sprite images is a very good idea when it's applicable, and people who want websites are starting to learn this concept and ask for it by name. Fortunately, there are even sites that can take your individual images and produce a sprite image for you, including the necessary CSS; search for something like "make sprite image" and you'll find plenty.

Speed of user experience in websites

Giving the user a fast browsing experience is more important than ever, especially with mobile devices that have slow transmission capabilities and comparatively slow processors. Fortunately, there are also more tools than ever which can help you to analyze why your site is running slowly. You can search for "website speed analyzer" and find several. They will give you advice on how to change your site and will probably even give a decent, quick explanation of what each of those things means. Most of the tweaks you can make are things that you'll see in lessons 7 & 4. You won't know how to configure the server yet in order to implement any recommended "server-side" changes, but we will see that in the lessons following. And the good news is that most of the time, the server has already been configured well enough and the most effective changes you can make are "client-side" changes, such as changes to HTML/CSS/JS.

Sending information with a request

The last part of Lezione 7 deals with sending information to the server. Imagine you are logging into a website and you are sending your username and password. That is included in the request. Where should it go? That's a good question with a few valid answers, but the designers of HTTP suggest that it goes in the body/payload area of the request (usually; see next paragaph), something like this:

GET /loginPage.html HTTP/1.1
ExpectedResponseType: text/html

JoeBlow
goNoles!

This example is actually incorrect for a few reasons. For one, there are several types of requests and "GET" requests shouldn't have a body. For example, there is a "PUT", though it is actually fairly uncommon and "POST" is used where you might think something like "PUT" would be appropriate; it's just an unfortunate consequence of history.

Anyway, you can read more about the types of requests and how to send information by reading that last part of the PDF. However, you may only get a vague feel for the material right now, and that's fine. It will be reinforced and developed further in later lessons, and you will get a chance to work with this information being sent (e.g. receive someone's username+password and do something with it), which will solidify the concept.