This tutorial has been translated as follows:

HTML for the English Major: A Tutorial

This document presents a few bare essentials of HTML. HTML is a method for marking documents up to make their structures explicit, so that machines can read them. HTML has gone through many versions since its invention a little over a decade ago. This tutorial presents the latest and best version of HTML that is in widespread use, called XHTML 1.0 Strict. This tutorial is written for nontechnical readers and introduces a handful of the most important markup elements. This document also demonstrates all of its own principles: everything mentioned in it is put to use in it, and, conversely, all the structural elements it contains are explained in its content. In order to keep things simple, it does not have much graphic styling. Use of Cascading Style Sheets (CSS) to improve the appearance of a document will be covered in a separate tutorial. A tiny bit of CSS will be introduced near the end of this tutorial, however, in order to help make sense out of some HTML features.

Contents

Concept of HTML in a Nutshell

HTML stands for Hypertext Markup Language. It is not a computer programming language. Rather, HTML is a small set of symbols used in a text document to clarify the structure of the document in such a way that software such as a Web browser can present that structure appropriately. For instance, a paragraph is a structural element of a document. Most browsers will, by default, present a paragraph with wordwrap until the last line and with a line of vertical space above and below the paragraph. Other document elements include headers, lists, and tables.

HTML cannot control how a document appears in a Web browser window, because browsers, media, hardware, connection speeds, and users' needs vary widely. Instead, HTML clarifies the structure of a document, and it is up to the individual browser to present that structure suitably.

Browsers ignore any extra space you have in your HTML code, such as tabs or extra lines, so you can format your HTML code any way you want without affecting the appearance of the document in the browser window. The browser uses the HTML symbols alone to determine how to display the content. Spacing is good to use in your code for your own purposes, such as to help you and other coders find elements when revising the document.

HTML should be considered an extension of punctuation. Over the centuries, punctuation has been developed to get across in writing structures of language that cannot be represented with letters. A capital letter signals the beginning of a sentence and a period the end. Commas designate clauses and parenthetic phrases. But, beyond the sentence level, writing relies heavily on spacing and font selection to express structural relationships. These things would vary greatly from computer to computer, depending on the size of the screen, memory capacity, graphical resources, etc. Hence, HTML does not use spaces to express structure, but, instead, symbolic markup, which each computer can interpret appropriately. And, just as Spanish punctuation lets you know where a question or exclamation begins as well as ends by enclosing it in ¡ ... ! or ¿ ... ?, likewise HTML encloses structural elements in pairs of symbols, one at the beginning and one at the end. HTML is conventional and consistent enough for computers to read it mechanically.

Elements can be divided up into two groups: containers and empty, or standalone, elements. Most elements of a document are containers — they contain text and certain other elements. For instance, a paragraph contains plain text and may contain emphatic or strong text or even an image. Other containers include tables, ordered and unordered lists, the list items that those lists contain, and emphatic text, which contains the text to be emphasized. Empty elements include images, horizontal rules (horizontal dividing lines), and line breaks that may occur within paragraphs for special purposes such as verse or addresses.

There are rules for what can contain what. Paragraps may not contain paragraphs within them. Table cells and list items may contain paragraphs and anything paragraphs contain, but paragraphs may not contain tables or lists. Lists may only contain list items. The list items, in turn, may contain various other elements. (This is explained better below, under "Ordered Lists.")

The HTML symbols that designate structural elements of a document are called tags. Elements that contain text or other elements begin with an opening tag and end with a closing tag. The opening tag consists of a symbol for the element enclosed in the < and > characters: <p> for a paragraph, <table> for a table. The closing tag has an additional / character to distinguish it from the opening tag: </p> and </table>. So a paragraph begins with the symbol <p> and ends with the symbol </p>, with its content coming between the two tags. Empty elements, such as images and horizontal rules, are handled differently. Empty elements consist of single tags that open with the < character and end with a space (or line break in your code, which is interpreted as a space), followed by />:

    <hr />
        
    <img
        src="fun.gif"
        title="Image of us having fun."
        alt="We have been having fun."
    />
        

Any opening or standalone tag may contain, after the name of the element (such as p or img) one or more modifying attributes. An attribute specifies some ... attribute of the element. In the second example above, the img element has three attributes: src, title, and alt. The src attribute, in this case, provides the name (and perhaps the path, or location) of the file containing the image code. The title attribute provides text to appear in case the image does not arrive in a graphical browser, or when the user rolls the mouse so that the pointer hovers over the image. The alt attribute provides text to be rendered in a nongraphical browser, such as a text browser, a braille browser, or an audio browser.

Note that the attribute is always specified by name with an equals sign and then the value of the attribute, in quotation marks. There are no exceptions.

HTML document elements can also be divided up between those that define an area of the document and those that surround a word or phrase. The first type of element is called a block-level element. Paragraphs, lists, and tables are block-level elements. The other type of element is called an inline element. Examples of inline elements are emphatic phrases, strong-appearing phrases, book titles, and links. Inline elements must always be contained within block-level container elements. For instance, emphatic text must always appear within a paragraph or list item or some such larger structure.

Jump back to table of contents.

Getting Started

Before we look at the elements that make up an HTML document, you might like to have a first experience creating and viewing some HTML. Necessarily, this will mean doing some things that you do not yet understand. As you read along, however, each bit of what you will do now will become clearer in retrospect — I hope!

First, you need to open a pure text editor on your computer. On Windows machines, this will be Notepad, and, in Macintosh machines, it will be Simple Text or perhaps BBEdit. Then, mechanical as it sounds, just highlight the text below, from right after where it says "start copying" to where it says "stop copying" (and not including those two lines). You do this by holding your left mouse button down and dragging it over the text. If this doesn't work, or you don't have a mouse, you will need to copy the whole thing by hand, which could take some time. Anyway, assuming that you could highlight the text with your mouse, copy it, and plunk it (paste it) into your editor window. On most machines, you can do this by clicking your mouse while it is pointed on the editor window rather than the browser window, and then selecting "Paste" from the Edit menu. Then follow the instructions right below the "stop copying" line.

Start copying (below this line).

<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US">
  <head>
    <title>First HTML Page</title>
  </head>
  <body>

    <h1>It Works!</h1>

    <p>This is an HTML document, and I can view it without
    the code gobbledy-gook in my browser window.</p>

  </body>
</html>
        

Stop copying (right above this line).

Once you've put the text above into your editor window, save it as a new document. The name under which you save it must take a special form. The first part — the name proper — may not have any spaces in it, and must consist only of letters, numerals, the underscore character, (_), or the hyphen character (-), in any combination. Then must follow a period ("dot"), and then the four letters "html." Be sure to save the document to a place where you can find it right away! In Windows, this might be somewhere in your My Documents directory.

Take a moment to notice the tags and overall arrangement of your HTML code.

Next, leaving your editor window still open, open up any sort of file list, such as Windows Explorer in Windows, and find the listing for your file. Open the file from that listing. This usually means double-clicking on it. What should happen, if all goes well, is that a browser, such as Internet Explorer or Netscape Navigator, will open up and show you a nice, code-free rendering of the HTML page that you have just created.

To make life more interesting for you, go ahead and change the text between the <p> and </p> symbols in your editor. You can add as much or as little text as you like, so long as you keep the <p> at the beginning and the </p> at the end and do not do anything to any other part of the code. Save the file in your editor, switch windows to the browser, and refresh the browser (or reload the page). To do this, typically, you go to the View menu and find "Refresh" or "Reload." You should see the changes you have made reflected in the browser rendering.

Now that you have seen some HTML in action, let's look at the elements that make up a good HTML document. This will not take very long, and, soon, you will be able to create HTML documents from scratch and just as you want them.

Jump back to table of contents.

Block-Level Container Elements

Jump back to table of contents.

Paragraphs

Paragraphs are explained above. HTML does not provide some other useful block elements, such as a poetic stanza. For this reason, you wind up calling many things "paragraphs" that may not at first seem like paragraphs. Always be sure to have both your opening and closing <p> and </p> tags.

Jump back to table of contents.

Headers

A header is an element of text used to head a section of the document. Graphical browsers generally cause headers to appear in boldface and in a larger size than the rest of the text. HTML provides you with a whole series of numbered headers representing the level of priority, so that a first-priority header is marked with the tags <h1> ... </h1>, and the priority or importance decreases as the numbers increase. You do not have to follow the numerical order exactly for subordinate headers, though, in this text, I have chosen to do so. For instance, under an item headed with an <h1>H1 Element</h1> element, you could have one with a small header, e.g., <h4>H4 Element</h4>. A header always occupies its own line of text in the browser window, and most browsers also insert vertical space above and below any header, regardless of how high its number is and therefore how small it is.

Jump back to table of contents.

Ordered Lists and List Items

An ordered list is a list of numbered items. The list begins with the tag <ol> and ends with the tag </ol>. Within these tags, each list item must begin with the <li> tag and end with </li> (for "list item"). List items, in turn, may contain various other elements, such as paragraphs, but an ordered list may not contain anything except list items. In other words, there must be no tag between the opening <ol> tag and the first <li> tag.

You do not type numerals into your ordered list. Instead, the browser automatically puts the numerals in in the right order. If you move the list items around, the browser will reassign the numerals for you. It's good practice to write the list out with indenting, though this makes no difference to how the list will appear in the browser window:

    <ol>
        <li>First Item</li>
        <li>Second Item</li>
        <li>Third Item</li>
    </ol>
        

Here is how the above code will appear in your browser window:

  1. First Item
  2. Second Item
  3. Third Item

You can embed a list within a list, and do so to any level of hierarchical depth, so long as each embedded list is contained within a list item. The table of contents list at the top of this document is an exmaple of such embedding. Here is a simpler example:

    <ol>
        <li>First Item</li>
        <li>Second Item
            <ol>
                <li>First Subordinate Item</li>
                <li>Second Subordinate Item</li>
            </ol>
        </li>
        <li>Third Item</li>
    </ol>
        

And here is the result of the above code as your browser interprets it for presentation:

  1. First Item
  2. Second Item
    1. First Subordinate Item
    2. Second Subordinate Item
  3. Third Item

Notice how the subordinate ordered list is enclosed within a list item in the parent list.

Cascading Style Sheets ( CSS), to be covered in a separate tutorial, make it possible for you to change such things as the numbering method — to, say, Roman numerals (capital or lower-case), or to letters instead of numbers.

Jump back to table of contents.

Unordered Lists and List Items

An unordered list is just like an ordered list, but does not appear with numerals. Instead, browsers generally render them as bulleted lists, like this:

To create an unordered list, you simply use <ul> and </ul> tags instead of <ol> and </ol>. You can embed ordered or unordered lists within ordered or unordered lists in any permutation you wish.

Jump back to table of contents.

Tables, Table Rows, and Table Cells (or "Table Data")

A table is a structure that has columns and rows and is best used to set forth information that follows that structure, such as a table of faculty e-mail addresses and telephone numbers:

English Department Faculty List
Name E-Mail Telephone
Amittai Aviram avirama@gwm.sc.edu 777-2058
Stan Dubinsky dubinsk@vm.sc.edu 777-2056
Judith James jamesj@gwm.sc.edu 777-5063
William Richey richeyw@gwm.sc.edu 777-2054

To create a table, you open the element with a <table> tag. You can specify the border width in pixels, as well as the cellpadding and cellspacing. Cellpadding is the amount of space between the text within a table cell and the borders around it and is a very useful attribute. (In the table above, cellpadding is set to "5.") Cellspacing is the amount of space between cells and is, in this writer's experience, an utterly useless attribute, because any cellspacing besides "0" always looks bizzarre. (But feel free to try out other numbers.) These three attributes of the table element smack of presentational rather than structural concerns — of how things look rather than what things are — but they are a rare exception to the structure-minded spirit of XHTML.

In the same vein, XHTML allows you to specify the width of a table, if you need to do so. If you do not, most graphical browsers will make the table only as wide as is necessary to accommodate the data presented in it, causing text to wrap if necessary in order to keep the table's width from exceeding that of the browser window. For an example of when you might wish to specify the width of tables, suppose you have a page with two tables, one below the other. You may feel that the two tables need to appear comparable to each other, and therefore should be forced to have the same width. If you specify the width with a mere numeral in quotation marks (width="500"), that will force compliant graphical browsers to draw the table at the specified width in pixels — but you should almost always avoid doing this. It is almost always better to specify the width of a table as a percentage of the total window width: width="85%". Specifying the width does have the advantage of helping the browser to draw the table by removing one set of calculations from its drawing process, which may sometimes result in faster performance as the page comes into view.

Next, optionally, you can specify a caption element for your table. This will appear by default right above your table, usually centered. This is a very useful element, not only for graphical and text browsers but also for audio browsers, since it makes the purpose of the table clear.

Then, still between the opening <table> and closing </table> tags, you must have at least one table row element, and that row must have at least one cell element. The table row is marked with <tr> and </tr> tags. The cell begins with <td> and ends with </td>. The "td" stands for table data. (They did not ask me for an opinion when they made this up!) In place of a regular table cell of the "table data" type, you can use a th element for a row or column label. ("th" stands for "table header," which makes some sort of sense.) By default, browsers usually display the text in a th element in boldface, which is useful.

If you use a th header element for a column or row, it's good practice to specify a scope attribute as either "col" or "row." This tells audio browsers whether the header applies to its column or its row and helps prevent disorientation when someone listens to a table being read aloud.

Here is the code for creating the table above:

    <table border="1" cellpadding="5" cellspacing="0">
        <caption>English Department Faculty List</caption>
        <tr>
            <th scope="column">Name</th>
            <th scope="column">E-Mail</th>
            <th scope="column">Telephone</th>
        </tr>
        <tr>
            <th scope="row">Amittai Aviram</th>
            <td>avirama@gwm.sc.edu</td>
            <td>777-2058</td>
        </tr>
        <tr>
            <th scope="row">Stan Dubinsky</th>
            <td>dubinsk@vm.sc.edu</td>
            <td>777-2056</td>
        </tr>
        <tr>
            <th scope="row">Judith James</th>
            <td>jamesj@gwm.sc.edu</td>
            <td>777-5063</td>
        </tr>
        <tr>
            <th scope="row">William Richey</th>
            <td>richeyw@gwm.sc.edu</td>
            <td>777-2054</td>
        </tr>
    </table>
        

Notice the indenting, which helps the coder to find his or her way in the code. It has no bearing on how the results appear in the browser window.

Jump back to table of contents.

Block Quotations

If you have an extended quotation, you can place it in a blockquote element by surrounding it with the <blockquote> and </blockquote> tags. A blockquote element may contain such things as paragraphs, but a paragraph may not contain a blockquote. Normally, if you have a block quotation, you would finish the paragraph before it with your closing </p> tag, and then place the quoted paragraphs within your <blockquote>... </blockquote> tags. Graphical browsers usually present block quotations by default with an indented left margin and a line of vertical space above and below. Authors should not use this element just to make something indented, however. If you have text that is not a quotation but that you want to indent it, you should put it into the appropriate element, such as a paragraph, with an appropriate class attribute, and then either write a CSS class or have your webmaster do so, so as to increase the left margin of that element.

Jump back to table of contents.

Empty (Standalone) Elements

This category is really a mix of two, since some empty elements are at the block level and others inline. Inline elements must be contained within block-level elements. Images and line breaks are inline elements, so they may not appear outside of some sort of containing element, such as a paragraph or, if need be, a division. (See further below about the <div> tag and the division element.)

Jump back to table of contents.

Images

When you embed an image in an HTML document, the image itself is not actually part of the document's code. Instead, the HTML source code has an img element in it, which contains information linking the current document with another file, which, itself, has the code for the image. Indeed, image files and HTML document files are very different in nature: HTML files consists purely of text characters and symbols, whereas image files are made up of binary code that cannot be read by human beings. The img element's tag contains information that enables the browser to request the image file as a separate item and display it in the right place with the right amount of space reserved for it. (The section below on servers explains more about this "requesting" process.)

The file containing the actual binary code for the image is called the image's source. Since text and audio browsers do not display images, there also needs to be some sort of text in the img element to stand in place of the image in those contexts. Accordingly, the image tag must specify, at a minimum, two attributes: (1) the source of the image, and (2) an alternative text. It's also a good idea to specify (3) a title for the image. Finally, every img element should also specify the image's width and height in pixels, to help the browser apportion space for it.

The source attribute is spelled src in the <img> tag. Its value is the name of the separate file containing the image code. Whether you create your own image or get it (legally!) from some other source, you should always open it up first in a graphics editing program such as Adobe Photoshop or Jasc Paint Shop Pro. At the very least, then, you should resize it to the size you want it to have on your Web page, and then save it under the desired name and in a place where you can find it. The latter place will be the value of your src attribute, while the size dimensions will be needed for the width and height dimensions. Here's an incomplete img element, specifying only the src attribute:

(Incomplete markup — do not copy this example!)

    <img src="http://www.cla.sc.edu/engl/faculty/faculty.jpg" />
        

If the HTML document file and the image file are on the same machine, the address can be a relative address. In the latter case, if the document was in the same directory on the same machine as the image, the src attribute could have had simply "faculty.jpg" as its value.

The alternative text attribute appears in the tag as alt, and its value should provide a text equivalent of the informational value of the image, if there is any. The value of the alt attribute is used by text browsers such as Lynx in place of the image, and audio or braille browsers may use the same text for the same purpose. The alt attribute is mandatory, and composing its value gives you an opportunity to think about why you are putting the image there in the first place. Does your photo of the university department's faculty show its diversity and its number? Does your photo of a student reading a book outdoors show how inspired your students are? If your image really contributes no information and is purely decorative, you can put alt="" into the tag. Especially important is informative text in the alt attribute of any image serving as a link, such as a custom-made button.

It's also wise to specify a title. This should be more of a descriptive title of the image. Before the image has finished loading into a graphical browser, or if the image cannot load in for some reason — such as a problem with the server or connection — the browser may display the title in a box in the place where the image would have appeared. Also, many graphical browsers make it possible to view the alt attribute's text as a tooltip- like text balloon when the user moves the mouse pointer over the image. Generally, if the title is left out, a graphical browser will use the alt text for these purposes.

Finally, the dimensions of the image in pixels are specified. This may sound like a matter of how something looks rather than what something is, and therefore another presentational feature sneaking into XHTML, but an image is, let's face it, essentially visual, so its dimensions really are part of its nature. The dimensions are width and height. You can usually get these dimensions accurately from your image editing program, as just mentioned.

So the above img tag must be revised to look something like this:

    <img
        src="http://www.cla.sc.edu/engl/faculty/faculty.jpg"
        title="Photo of the USC English Department Faculty."
        alt="Our diverse faculty has over 50 members."
    />
        

Further details about the appearance of an image should be controlled by means of a style sheet. Use of style sheets to control the apparance of HTML documents will be covered later in a separate tutorial.

As an inline element, an image must be contained within some other, block-level element, such as a paragraph, a list, a table, a division, or even a header.

Jump back to table of contents.

Horizontal Rules

A horizontal rule is a horizontal line that appears drawn across the screen at the point in the text designated by the position of the tag in the HTML code. It is useful for separating divisions in a document. A horizontal rule is not an inline element but a block-level element, parallel with paragraphs, lists, and tables, so it does not have to be contained in any other block element and should not be contained in any except, optionally, a table cell. The horizontal rule tag looks like this: <hr />. In action, a horizontal rule looks like this:


Jump back to table of contents.

Line Breaks

A line break is a point where word wrapping is temporarily halted and the line of text is broken, to continue on the next line. This is very useful for rendering poetry or addresses. It can even occur within a header, to cause the header to be broken up into two lines, without any intervening vertical space. The tag always appears this way: <br />.

A line break is an inline element, so it must be contained within a larger, block-level element, such as a paragraph or a header.

Here is an example of a couple of stanzas of verse, first in markup and then rendered by your browser:

Markup

    <blockquote>
    <p>
    Simple Simon met a pieman<br />
    Going to the fair.<br />
    Said Simple Simon to the pieman,<br />
    "Let me taste your ware!"<br />
    </p>
    <p>
    Said the pieman to Simple Simon,<br />
    "Show me first your penny."<br />
    Said Simple Simon to the pieman,<br />
    "Indeed, I have not any."<br />
    </p>
    <blockquote>

        

Rendered

Simple Simon met a pieman
Going to the fair.
Said Simple Simon to the pieman,
"Let me taste your ware!"

Said the pieman to Simple Simon,
"Show me first your penny."
Said Simple Simon to the pieman,
"Indeed, I have not any."

The above code would have worked fine without the <br /> at the end of each stanza's last line, just before the closing </p>. I put it there just out of an obsession with consistency, and it doesn't hurt anything.

Jump back to table of contents.

Inline Container Elements

All inline elements are required to be contained within block-level elements. This rule includes links: a link must be contained within a paragraph, a list item, a table cell, or a division. The elements covered in this section are inline elements that also contain something. Emphatic text, strong text, and title citations may only contain text or links that in turn contain text. A link may contain plain text, or a container of text such as a whole paragraph or header, or, indeed, an image.

Jump back to table of contents.

Emphatic Text, Strong Text, and Title Citations

Emphatic and strong text and title citations will be considered together because all three work more or less the same way, by designating a word, phrase, sentence, or other text unit for some sort of special treatment. Strong text is best used to cause text to appear more forceful, and graphical browsers typically render strong text by default as boldfaced. (Every time you see a term such as strong text boldfaced in this document, that's because it's actually contained within a strong text element.) Likewise, emphatic text is for emphasis and usually appears in italics in your graphical browser window. Title citations (surrounded by the <cite> and </cite> tags) should be used for the titles of books, plays, films, works of music, etc. These, too, generally appear as italics in graphical browsers, which thus follow the usual modern printers' convention of rendering both emphatic text and the titles of independent works of art in italics. Here is a good instance, however, of the difference between HTML and word processing. HTML makes a distinction between titles and emphases because they are different things, even if they are usually treated the same way visually. With the use of Cascading Style Sheets (CSS) , to be covered in a separate tutorial, you (or your webmaster) can, indeed, set the presentation of these two elements to be different. This is discussed further in the section on <div> and <span> below.

In printing, we use italics for several other purposes besides emphases and titles, and you may find the choice of <em> ... </em> and <cite> ... </cite> limiting. What about foreign words and phrases? What about technical terms when they are first introduced? What about English sentences used as specimens — examples of dialect, grammar, or usage? For these purposes, the i element and its <i> ... </i> tags may be useful. These are covered in the next section.

Jump back to table of contents.

The Holdover <b>, The Still-Useful <i>, and a Glance at Cascading Style Sheets

XHTML 1.0 Strict allows the use of <b> ... </b> tags for boldface text and <i> ... </i> tags for italic text. These are holdovers from older versions of HTML, which did not distinguish structure from presentation as sharply as the current version does. Use of these elements is generally unnecessary and discouraged, in favor of strong and em elements, respectively, as well as cite.

There are some instances, however, where the i element comes in very handy: specifically, for foreign expressions, for the introduction of technical terms, and for text used as specimens of dialect, grammar, usage, or the like:

Code:
    <p>E-mail abbreviations, such as <i>FWIW</i> to mean "for
    what it's worth" and <i>AFAIK</i> for "as far as I know," have
    become a sort of computer-age <i>argot</i> — a
    <i>nonstandard</i> usage — not likely to appear yet in the
    <cite>Oxford English Dictionary</cite> and therefore not exactly
    <em>formal</em> Standard English.</p>
        
Result:

E-mail abbreviations, such as FWIW to mean "for what it's worth" and AFAIK for "as far as I know," have become a sort of computer-age argot — a nonstandard usage — not likely to appear yet in the Oxford English Dictionary and therefore not exactly formal Standard English.

But, in fact, foreign expressions and language specimens are not only distinct from emphatic expressions and title citations, they are also different from each other. Suppose later I might wish to have browsers render them distinctly. So long as I have used the same tags to designate both foreign expressions and language specimens, I will not be able to get the browser to distinguish them visually. The solution to this problem is to use the class attribute in conjunction with a Cascading Style Sheet to create a subdivision within the category of the i element, so as to have one class of i elements for foreign expressions, another for technical terms, and a third for language specimens. While I am at it, I will use the lang attribute to specify the language of the foreign expression, in case I might later wish to use this information in some special way in a program. The class and lang attributes are standard attributes — the former for any element at all, in conjunction with CSS, and the latter for any element that contains text (and can therefore have a language). I've also distinguished the kind of title citation I have by giving it a class attribute whose value is book. Here is the same passage, with the new attributes added:

    <p>E-mail abbreviations, such as
    <i class="specimen">FWIW</i>
    to mean "for what it's worth" and <i class="specimen">AFAIK</i>
    for "as far as I know," have become a sort of computer-age
    <i class="foreign" lang="fr">argot</i> — a
    <i class="technical">nonstandard</i> usage —
    not likely to appear yet
    in the <cite class="book">Oxford English Dictionary</cite> and
    therefore not exactly <em>formal</em> Standard
    English.</p>
        

To demonstrate the use of CSS with the class attribute — very briefly — I have created a separate file called tutorial.css that contains definitions of my classes. I have decided to change the foreground and background colors for foreign expressions, language specimens, and book titles, to make them all look different from each other. I have also increased the weight of the font in these elements so as to make their colors easier to see. I have left emphatic text with its default rendering.

    cite.book {
        font-style: italic;
        font-weight: bold;
        color: blue;
        background-color: white;
    }

    i.foreign {
        font-style: italic;
        font-weight: bold;
        color: red;
        background-color: #99ff99; /* Numeric code for light blue. */
    }

    i.technical {
        font-style: italic;
        font-weight: bold;
        color: #009900; /* Dark green. */
        background-color: white;
    }

    i.specimen {
        font-style: italic;
        font-weight: bold;
        color: white;
        background-color: #009900; /* Numeric code for dark green. */
    }

        

The HTML document you are reading is linked to this CSS file by means of a link element, which occurs near the top of this document's source code in the head element. Both the head and link elements are explained further below.

Here is the result of the new code, with the class (and lang) attributes specified, and with the CSS classes defined:

E-mail abbreviations, such as FWIW to mean "for what it's worth" and AFAIK for "as far as I know," have become a sort of computer-age argot — a nonstandard usage — not likely to appear yet in the Oxford English Dictionary and therefore not exactly formal Standard English.

Graphical browsers that do not support CSS will likely still render all the i elements as italics, but that is not bad as a fallback.

The value of a class attribute is the same as the word following the dot in its CSS entry, which is, by the way, called a CSS class selector. The rules for the value of the class attribute and its corresponding CSS class selector are fussy: each single selector-and-class-name must be only a single word, with no spaces, and should only consist of letters, numerals, or the hyphen, in any combination but starting only with a letter. A class value of foreign or foreign1 is OK, but not foreign? or *123foreign.

The lang attribute's value must be chosen from a set of standard two-letter abbreviations, such as en for English, fr for French, de for German, es for Spanish, and ru for Russian. For the most part, these are the same as those two letters at the ends of Web addresses outside the US.

Jump back to table of contents.

Links ("Anchors")

The list above should make the use of the first three types of inline elements obvious. Links, however, require some special treatment. For historical reasons, a link element is not technically called a "link" but an anchor, so it is marked up with the opening <a> and closing </a> tags. (I don't know why somebody decided to call it an "anchor," but now we are stuck with it.) Just as an image element (img) must have at least the source of the image specified as the value of the src attribute, the target of the link (or "anchor") must be specified in the opening <a> tag, using the href attribute.The target of the link is the address to which the browser should go next when you click on the link, and href stands for "hypertext reference," which is more or less what it is. The value of the href attribute is an address, also known as a URL, just like the value of the src attribute. Just as with the src attribute of the img tag, so here, you can use a relative address if the document and the target of the link are on the same machine and therefore have the same beginnings to their URLs — for instance, http://www.cla.sc.edu/index.html and http://www.cla.sc.edu/departments.html. In such a case, a link on the first page referring to the second could look like this:

    <a href="departments.html">Departments</a>
        

Otherwise, however, if the target of the link is found at a different host — the stuff coming immediately after "http://" is different — then you must specify the entire URL, including the http:// at the beginning:

    <a href="http://www.amittai.com/">Amittai's Website</a>
        

What goes between the opening <a> and closing </a> tags is the text that will be used as the link. By default, browsers usually highlight this text in some way, usually by rendering it as underlined blue text.

Underscoring for purposes other than links is not considered a matter of structure, so it is not supported by the version of HTML taught in this tutorial (XHTML 1.0 Strict). You can get underscores to appear on most browsers by designating some other element to be rendered as underscored text by means of a Cascading Style Sheet ( CSS). Cascading Style Sheets are a technique for instructing browsers to render HTML structures according to specified presentational rules. They are a big topic to themselves and will not be covered here. Moreover, if you are creating Web pages for a larger website that has a Web manager or webmaster, it's probably better just to tell the webmaster how you wish to have your structure appear, and have the webmaster create the style sheet for you. The important thing is to designate the element structurally that you would like to see rendered as underscored text. See the next section, on the div and span elements, for a tiny introduction to this topic.

The target of a link may be something other than an external document. Instead, it may be a different part of the same document. For instance, you may have a table of contents at the top of the document in the form of an ordered or unordered list, where each item in the list is the title of a section and is also a link, so that you can click on that title and jump to the item. Then, in the item itself, there may be a similar link back up to the table of contents. The links used for this purpose are the same in form as links that point to external documents. The only difference is in the target address. Intead of using a URL or relative address, you use a fragment identifier. A fragment identifier is the equivalent of an address, but is used for a fragment or part of a document. It always has the symbol # in front of it. The following link points to a fragment designated "top":

    <a href="#toc">Jump to bottom of document</a>
        

In order for this to work, you have to make identifiable that place in the document to which the link points. You accomplish this by giving an element an id attribute. For instance, a header can serve as the target fragment, and can contain an id: <h2 id="info">Info</h2>. This is the preferred method for designating a place in your document as an identifiable fragment.

This document already demonstrates the use of this technique for internal links, with a table of contents at the top linked to sections and links below each section allowing the user to return to the table of contents. All the target fragments are identified with id attributes in the headers.

You can combine a URL to another document and a fragment identifier within that other document so that a link will take you to a specific part of that other document. In that case, the href attribute will have as its value the URL, followed without space by the fragment identifier, which is separated from the URL by the # character:

    <a href="another_document.html#part_in_middle">
        Go to a place in the middle of another document.
    </a>
        

Aside from linking to another document or to another place in the same or another document, you can, finally, use an anchor to link to an e-mail address. Generally, the user's browser will handle such a link by opening the default mailing program when the user clicks on the link, with the address of the recipient already entered into the "To" field of the outgoing e-mail. To create this kind of link, you use the href attribute as before, but the value of it is the word mailto, followed by a colon, followed by the e-mail address desired:

    Contact the author:
    <a href="mailto:amittai.aviram@gmail.com">
    amittai.aviram@gmail.com
    </a>
        

There is an example of a mail link at the very bottom of this document. Since the tie-in to mailing depends entirely on whether the user happens to have both mailing and browser software on his or her system and happens to have the two of them working together, you cannot assume that the mail link will work. Therefore, always put the actual e-mail address between the opening and closing tags of the a element. That way, people who can't just click and mail will know your intended address and can send an e-mail to it some other way.

If you would like to have a link to a Usenet newsgroup, you use the very same technique as for linking to an e-mail address, except that you use the prefix news: instead of mailto:. The very same rules apply about putting the name of the newsgroup in the text of the link, in case the user's computer is not set up to open a newsreader program automatically — which is often the case. Another technique for linking to a Usenet newsgroup uses a link to the Google News Service on the Web. If you look at the Resources list at the end of this tutorial and then look at its source code, you will see that technique in action.

Jump back to table of contents.

Abbreviations and Acronyms

If you have been reading this document in an up-to-date graphical browser, you may have noticed a tooltip-type text bubble pop up when you have drifted your mouse across the text, saying either "Hypertext Markup Language" or "Extensible Hypertext Markup Language," or else "Cascading Style Sheets." This would occur whenever your mouse pointer is on top of one of the three abbreviations — respectively, HTML, XHTML, or CSS. HTML provides two interesting inline elements, one for abbreviations and the other for acronyms: abbr and acronym, respectively. In both cases, you should set the title attribute to the value of the expanded meaning of the abbreviation or acronym. This procedure not only enables compliant browsers to display an instant definition of the abbreviation or acronym; soon, browsers will offer the user the option of expanding all abbreviations and acronyms in the text or contracting them back at will, or displaying the expanded version of the first occurrence of every abbreviation and acronym. This should work for text, braille, and audio browsers as well as graphical browsers. Accordingly, I have enclosed every instance of every acronym in this document in <acronym> ... </acronym> tags, and have set the title attributes appropriately. To save myself the tedium of having to type the rather lengthy markup every time, I have just typed the acronyms all the way through at first, and then used a global search-and-replace function in my editor to change all occurrences of those acronyms so as to bear their tags.

Some framers of the current XHTML specification were clearly confused about the the word acronym, which really means pronounceable letterwords, such as radar, LAN, and UNICEF. The apparent redundancy of the abbr and acronym elements may result more from competition among browser makers — "browser wars" — than from need or logic — a point underscored by drastic differences in browser support. In short, these two elements are a part of HTML still in flux.

Jump back to table of contents.

<div> and <span>

If you have certain structural elements in your document for whose designation the above HTML vocabulary does not suffice, you can mark your element at the block level with <div> and </div> tags, and, on the inline level, with <span> and </span> tags. Since these tags express no specific meaning, their function is left entirely up to style sheets. Usually, a div (division) or span element will have a class attribute specified. The class name is then used in the style sheet to control the element's presentation.

For instance, suppose I have a literary-critical essay in which I quote a block of text and then comment on a passage within that block. I might wish to highlight the passage in a different color. I will call that passage a span and set the class value to highlighted and then use a CSS class definition to change its appearance:

    <p>Chaucer's narrator comments on the Monk's
    preference for hunting over life in the cloister
    with what sounds like approval:</p>

    <blockquote lang="enm">
    <p>
    And I seyde that his opinion was good.<br />
    <span class="highlighted">What sholde he studie and make hymselven
    wood,<br />
    Upon a book in cloystre alwey to poure,<br />
    Or swynken with his handes, and laboure,<br />
    As Austyn bit?  How shal the world be served?<br /></span>
    </p>
    </blockquote>

    <p>In most of this passage, the narrator uses
    <iclass="technical">free indirect discourse</i>,
    narrating about the Monk in third
    person but imitating what presumably are the Monk's terms and
    arguments, as if to impersonate him.  This device suggests the
    possibility of irony, so that the narrator may be connoting a
    critical view without expressing it.</p>
        

Then, the following code occurs in my style sheet:

    span.highlighted: {
        font-weight: bold;
        color: black;
        background-color: yellow;
    }
        

Here are the results:

Chaucer's narrator comments on the Monk's preference for hunting over life in the cloister with what sounds like approval:

And I seyde that his opinion was good.
What sholde he studie and make hymselven wood,
Upon a book in cloystre alwey to poure,
Or swynken with his handes, and laboure,
As Austyn bit? How shal the world be served?

In most of this passage, the narrator uses free indirect discourse, narrating about the Monk in third person but imitating what presumably are the Monk's terms and arguments, as if to impersonate him. This device suggests the possibility of irony, so that the narrator may be connoting a critical view without expressing it.

Here is an example of how to use a nonce div element in conjunction with CSS code:

    <div class="centered">
        <h4>Centered Text</h4>
        <p>This text is centered
        below a centered header.</p>
    </div>
        

This is added to the tutorial.css file:

    div.centered {
        text-align: center;
    }
        

And here is the result:

Centered Text

This text is centered below a centered header.

Watch for a follow-up to this tutorial that will explain the basics of style sheets in more detail.

The div element also comes in handy when you need to enclose an anchor in a container but a paragraph or other such textual structure is inappropriate, because the anchor really is off by itself. This document uses a div element for this very purpose, to contain the <a name="top"></a> at the top of the document that serves as the target for links pointing back up to the table of contents.

Jump back to table of contents.

Comments

Often, it is a good idea to insert some text into your document that you do not want the world to see in a browser window, but that you and any other coder should see whenever you look at the document source code. You can insert reminders, for instance, to update some section periodically, or some text that will help you find a place that you need to revise often. An HTML comment is marked with an opening <!-- and a closing --> tag. Anything between the <!-- and --> will never appear in the browser window but will, of course, appear in the source code. Another way that comments can come in handy is to enable you to mask out a section of HTML. For instance, some part of a page may be temporarily invalid, but you may wish to keep the text there in case you later want to have it displayed again. You can simply place the whole thing in a comment and it will disappear from the browser without a trace.

Jump back to table of contents.

Head and Body

The discussion so far has provided you with markup for the text in the document. An HTML document also must have a little bit of markup surrounding the entire content, to enable the browser software to process the document correctly. First of all, the entire document is considered one big html element. We also say that html is the root element, from which all the other elements "branch out." (The metaphor is a bit abstract.) So the entire document must be marked up with the opening <html> and closing </html> tags. There must be only one html element in the document.

    <html>
    <!-- Your whole document goes here. -->
    </html>
        

The <html> tag ought to contain a couple of special attributes, to be discussed shortly.

In order to see the <html> tag for yourself, you can view the source code from your browser. On a graphical browser such as Microsoft Internet Explorer or Netscape Navigator, find "View Source" in the View menu and select it. This will open a separate window with the source code of this document, and you can then see all the tags, including the opening <html> and closing </html>. The latter will be the very last item you see, but the former will not be the first. What comes above it will be discussed shortly, together with those attributes for the opening html tag.

Contained within the html element are two important components, the head and the body. The content of the head element is mostly invisible to the user and instructs the computer on items that condition the rendering of the document. For instance, it is in the head that the style sheet appears, or an element linking the document to an external file with a style sheet. (This type of linking element, by the way, is called a link! It is specified in a standalone <link /> tag.) One element within the head is displayed, however, and should always occur in the head: a title. The text specified in the title element appears in the title bar, which is, typically, the blue or silver bar at the very top of a graphical browser window. If you do not specify a title, the title bar will usually just show the URL, which always looks lame and unprofessional and lets the world know that the page's author does not know HTML. Here is what the head element of this tutorial looks like:

    <head>
        <title>Short HTML
        Tutorial</title>
        <link rel="stylesheet" type="text/css" href="tutorial.css" />
    </head>
        

You should go ahead again and look at the source code of this document, to see the <head> and </head> tags enclosing the <title> and </title> tags and the standalone <link /> tag. The link element links to an external style sheet that specifies the code for the span and div elements illustrated above. Notice how the elements are indented to make it easy to read the code, and how this has no effect on how the text is rendered in the browser window

Below the head element comes the body element, with nothing in between. The body element contains all the content of your document, including all the markup discussed before this section, such as paragraphs, lists, etc.

Jump back to table of contents.

Document Type Declaration

Above everything, including even the html root element, are a few lines of code that condition the browser's reading of the rest of the document, both head and body. First comes the XML declaration. This is put first because the current version of HTML, called XHTML, is a subset of XML and conforms to its rules. XML, Extensible Markup Language, isn't really a language but an abstract system of rules governing markup languages in general. In addition to the XML version, the XML declaration specifies the encoding, i.e., the character set used. This is increasingly important in newer browsers, which support many character sets, including Chinese, Thai, Hebrew, and Arabic. The encoding ISO-8859-1 supports all Western alphabet characters, including those that have accents or diacritical marks. The broadest and most inclusive encoding is UTF-8, which accommodates any language currently defined for internet use, including those with non-Western characters. UTF-8 is the encoding declared for this document. If you follow my example, your XML declaration should appear as follows:

    <?xml version="1.0" encoding="utf-8"?>
        

The XML declaration is not really a tag. It opens and closes with the special XML symbols <? ... ?> rather than < ... />.

Right below the XML declaration comes the document type declaration, which is also not a tag but looks a little like one. Here is how this document's document type declaration appears:

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
        

There is so much information in this item that I've split it up into two lines, but it can also be written as one. Basically, this declares that the document's root element is html, that the document type is defined and described elsewhere in a public document, that it is written in XHTML 1.0 Strict, that its definition is written in English, and where the definition of the rules of XHMTL 1.0 Strict can be found online.

As part of good XHTML 1.0 Strict practice, the html root element's opening tag should also specify a couple of attributes:

    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-
    US">
        

In case you are curious about these, the xmlns=" http://www.w3.org/1999/xhtml " specifies an XML namespace. XML is so abstract and flexible that it provides for more than one set of like-named attributes in the same document, differentiated by their namespaces. The namespace is signalled by a prefix before the attribute, such as xhtml:id. This prefixing could get tedious, so, instead, you should declare a default namespace for everything within the root element, and that's what the xmlns attribute does. Next, the language of the document should be set. XHTML relies on the language set in the XML namespace rather than in its own namespace, so the lang attribute bears the xml: prefix. Its value is the standard abbreviation for the document's language, along with the regional variant, such as en-US for American English or en- GB for British English. Following this, a plain old lang attribute duplicates the same information for the benefit of older browsers that cannot interpret any XML symbols and has no provisions for namespaces at all. Well, you asked, didn't you?

If you are writing a document in XHTML 1.0 Strict, as this tutorial advises, then you can just copy the first few lines of this document and insert it at the top of any of your documents. Copy from the first line through the opening <html> tag. This ought to spare you some unnecessary tedium.

By the way, every time you switch from the language of the document as a whole to some new language, you should say so in a lang attribute in the relevant element. Typically, this might be a i element, as mentioned earlier, or perhaps a blockquote element if you have a whole block of text in another language. You could also use the span or div element for this purpose. The language of every bit of your document is useful information for audio browsers, among other things, because it would apply different rules for pronouncing the same characters, depending on the language.

Jump back to table of contents.

Character Entities

If you've been alert, you may have been wondering about something throughout this document: every time I have presented an illustration of HTML code, I've included the tags in it so that you could see them. But, if the tags are there, how does the browser "know" that I am merely quoting the tags for you to see them rather than using the tags to structure my text? How do I stop the browser from taking my <table> tag and creating a table instead of just displaying the tag itself for you to see it? The answer can be found if you view the source of this document. Wherever I needed to represent a tag rather than actually to use the tag, I have avoided the opening < and closing > characters. Instead, I have used a special workaround to cause the brower to display those characters in the browser window without having to enter the characters literally into my code. For the opening < character, I use the character combination &lt;. What you see when I write that is the < character, but what the browser "sees" is the four characters making up the workaround. This type of alias for the character is called a character entity, and HTML defines a character entity for every character and symbol that can be displayed in the given encoding system. All character entities begin with an ampersand ( &) and end with a semicolon (;).

But how did I represent the combination &lt; and &gt; on the screen for you without those being translated automatically into the corresponding single characters < and >? As it happens, the entity for the ampersand is spelled &amp;. So you can use that to cause the ampersand to appear without writing the ampersand in your code, and that prevents your character sequence from being interpreted as a character entity to be rendered as its corresponding single character!

Character entities are also useful for "special characters" not included on a standard US-English keyboard. For instance, the combination &aacute; is used for the accented a character á. Likewise, &ouml; gives you ö, &ecirc; gives you ê, and &ccedil; gives you ç. Any standard reference on HTML will have a full list of useful character entities. See below for a reference link. You may find that this feature of HTML is so cool that you will never want to go back to plain word processing again.

So far, I have only mentioned character entities that have nice, mnemonic forms. Not all character entities have these mnemonics, but all of them do have numeric forms. In this document, I have been using the entity &#8212; for the long dash — like this. This comes in handy in transcribing Emily Dickinson's poetry.

Jump back to table of contents.

Writing HTML

So how should you write your HTML? What tools should you use? In order for your HTML to come out correctly, you must use some method that enables you to save the file as pure ASCII text. It is best to use something like Notepad on a Windows machine or Simple Text or BBEdit on a Mac machine. Linux and FreeBSD machines have many excellent pure-text editors, such as vi, emacs, and kate. If you use a word processor such as Microsoft Word, Corel WordPerfect, or Claris Works, you run the risk of having the program insert all sorts of unseen, nonstandard, proprietary code into the text in order to support its formatting system. With a simple, pure-text editor, you can trust that the text will always be whatever it is that you have entered. On Windows machines, you can find Notepad under Programs → Accessories. If you don't mind spending about $28, you can also get a super-duper-souped- up text editor for Windows machines called TextPad, available at http://www.textpad.com. This is a splendid tool for developing HTML, and I have used it for writing this tutorial.

Popular word processing programs, such as Corel WordPerfect and Microsoft Word, offer you the option to save a file "as HTML." This is almost always a mistake. Suppose you have written something in italics. How can your word processor tell whether those italics mean emphasis or a book title or a foreign expression or a technical term? It will reduce them all to the same type of element, such as the i element. Now, suppose you want someone to build a Web program for you that searches all your pages for book titles. This is quite plausible — it might be a search engine for faculty publications or for books discussed on pages in the website. The search would turn up words and phrases that you had meant merely to emphasize, mixed in with the titles. The same goes for headers and many other elements. You should not use a word processor to write HTML because HTML is far more articulate than word processing. Word processing is about arranging your words to have a certain look on the printed page. HTML extends written punctuation to make the structure of your document explicit. You are probably better off writing the document in HTML in the first place and then having your word processor convert it into word processed text for printing.

Many commercial tools and software packages are available that are supposed to make it "easy" to create fabulous HTML pages. These include Microsoft FrontPage, Adobe GoLive!, Macromedia Dreamweaver, and Netscape Navigator. All of these products introduce serious problems, the worst and most notorious offender being FrontPage. They tend to introduce into your code various nonstandard, proprietary items that make it hard or impossible for people not using those products to maintain the document. Also, they tend to lag behind the times, and people often have older versions of these tools, so the code they produce is rarely, if ever, up to the current XHTML standards. If you do use any of these tools, it is all the more important that you first understand HTML. For instance, if you do not know HTML, you may think of a header as simply larger and heavier type rather than as a distinct element. You may not look for the menu option to create a header, and wind up just creating big, heavy type. Your software is no more capable of reading your mind than a word processor is. Then, Now suppose that, later on, it occurs to you to put a grand header over the whole page, and subordinate all the current headers to that one? Since the previous headers are not marked as headers, you cannot do a global search-and-replace to turn <h1> into <h2> throughout the document. Instead, you will have to comb through your document line by line, looking for the big type! Or supose you want to change the overall style and have all headers appear in blue. How can this be done if no headers are placed in their appropriate h1 or other elements? HTML is not very complicated or hard, and, with a little practice, you may find it frankly easier just to write it by hand than to learn all the menus, tricks, and gizmos of an out-of-the-wrapper product. Not to mention that writing HTML by yoursself is much cheaper!

How to save and view a document was covered near the beginning of this tutorial, but let's review. Once you have written some HTML, save the file under any valid name, but with the file extension .html. The name must contain only characters, numerals, the underscore, or the hyphen. HTTP, the rules for handling HTML files on the Web and in browsers, is stricter than many operating systems, so you may not have any spaces, quotation marks, question marks, exclamation points, etc. in your file name. (Actually, the reason for this is that all those symbols are reserved for special purposes and functions. The question mark, for instance, comes between the URL proper and a string of input data coming from a form submitted online.) The file extension .html lets the browser "know" that the file is to be rendered rather than downloaded and saved and lets the operating system "know" that the file should be handled by a browser.

You should then view the results of your coding by opening the file in your browser. You can first open the browser and then, within it, open the local file (usually through the File→Open menu selection). If you get a dialog box that asks you to write the address into the blank, look for a "Brose" (Windows) or "Locate" button on the dialog. This will open up some sort of file list and you can go looking for your document there. Alternatively, you can find the file in a file list such as Windows Explorer or the My Documents list on a Windows machine and simply double click on the listing.

It is generally best to do this as soon as you have the minimal amount of HTML written that can be viewed in a browser — i.e., as soon as you have your declarations on top, the opening and closing tags of your html root element, at least some sort of head element, and a body with at least some text in it. Keep your text editor running at the same time. As you keep entering new HTML, you can then switch to the browser window and see the results. To update the browser window each time you have saved a revision of your document, select the View→Refresh menu item. Thus your work work proceeds in a cycle: write/edit, save, switch windows, refresh, see results, switch windows, write/edit, save, switch windows, refresh, see results, switch windows ... This is called the development cycle. To make the development cycle easier for you, you can usually use keyboard shortcuts to toggle between windows. On a Windows machine, you toggle between application windows by holding the Alt key down and pressing the Tab key. This same combination works on Linux machines running KDE. Many browsers on many operating systems let you hold the Alt or Command key down while pressing the r key in order to "refresh" or "reload" the page.

Jump back to table of contents.

Validate!

When you are finished with your HTML document and it looks as if the structure is well- defined according to what appears in the browser, you should try validating the document. The World Wide Web Consortium, also known as W3C, provides free use of its validation software, which checks your HTML for any errors or problems and tells you where any of these occur, by line number and character position in the line. You can find the W3C HTML Validator at this address: http://validator.w3.org. Follow the instructions for checking a file by uploading it to the Validator. Don't get frustrated if you have some errors. (I had some in this document!) You can correct them and try again until you get the supreme satisfaction of having the Validator congratulate you and offer you the privilege of inserting a little decal into your text to show that it is valid HTML. This can give you a warm feeling inside and a sense that you deserve to celebrate with a beer or two — or perhaps a long nap.

Since the whole point of writing HTML is to have machines read and render it appropriately, every little detail counts. Machines are very dumb and do not know how to accommodate misspellings. They are far less forgiving than English teachers. So every detail of the source code must be correct. The Validator makes it much easier for you to find your errors and correct them, because it automatically tells you where they are in your code and even suggests what's wrong with those spots. This is much better than looking at an unexpected blank screen and wondering where on earth the mistake is that caused all your lovely writing to disappear.

Jump back to table of contents.

"How Shall the World Be Served?"

When your document is finished and its HTML has been proven valid, it may be time to make it available to the public. To do this, you need to put it in a place where a Web server can get access to it to serve it to requesting computers. The general term server, in computerese, refers to any program that sends data (electronic signals) to any other program — called a client — when the client requests the data from the server. In the particular context of the World Wide Web, the program that plays the client role is your browser. When — called the client — and to provide data (ones and zeroes) to the client in response to the client's request. In the Web context, the client is the browser. When you point your browser to a Web address, your browser — the Web client — sends a request over the wires and connections of the Internet to the Web server. The server responds by sending the requested HTML document back to the client, i.e., your browser. You haven't had to do any of this in order to view your document until now, because you were the only person viewing your document — it hasn't been available publicly over the Internet yet.

Typically, the Web server runs on some computer — a host — that is kept on and connected to the Internet at all times and that also holds the HTML files on its hard drive. Normally, this host is far off somewhere, connected to your computer only by the Internet and not some office Ethernet cable, so the server's computer is called a remote host. (People often call the remote host itself a server by metonymy.) So your browser's (i.e., client's) request goes out to the remote host, and the server running on the remote host sends the resource back — the HTML document — if it's available. This whole automated interaction is governed by standard rules called the HTTP, Hypertext Transfer Protocol.

By the way, almost every one of the players in the drama of the HTTP transaction has a variety of technical names. The Web address is also called a URL, or Uniform Resource Locator, and is also known as the URI, or Uniform Resource Identifier. Your humble browser is also called a user agent, since it acts robotically for you, the user. Perhaps the most evocative name is one of the aliases for the Web server: the HTTP daemon. To my knowledge, however, the Internet is not officially called a "dark Satanic mill."

In order for a Web server to serve your page, it must have access to it, and this usually means that you must ship it from your local machine to the remote host, and save it in some directory on the remote host specially designated for the server to look there for it when requests come in. To do so, you usually send your HTML file to the remote host using software that follows the File Transfer Protocol, or FTP. By metonymy, the protocol name is turned into a verb: you "FTP your file to the server." In order to FTP your file, you generally need permission to save files to your remote host's Web directory, and you are issued a username and password for that purpose. You can then use a convenient commercial FTP client — yes, another kind of client — not a browser but a program that requests that the remote host connect to your local computer and allow you to send files back and forth between the two computers). Two popular FTP clients are WS_FTP and CuteFTP (which may or may not strike you as cute). If you don't have permission to mount HTML files to a remote host, you may have a webmaster in your office who is permitted to do this for you.

Aside from creating new documents, another use for HTML is to structure portions of documents that you might enter into form fields for special purposes. For instance, when you create an item on a Blackboard course website, you have the option of inserting the text as HTML. Blackboard itself provides the outer HTML code, including the html element's opening and closing tags, the whole head, and parts of the body. Blackboard inserts your new code into the body element. In such a case, you should write your HTML locally with all the necessary framework — document type declaration, head, and body elements — so as to validate it. Then, highlight all the code within the body, copy it, and insert (paste) it into the Blackboard form field, taking care to checkmark the box for HTML.

Jump back to table of contents.

Resources for Further Information

Jump back to table of contents.

Thanks

I wish to express my deep gratitude to the participants on the Usenet newsgroups comp.infosystems.www.authoring.html and comp.infosystems.www.authoring.stylesheets for their comments, suggestions, and corrections to this document while it was (and as it continues to be) in development. They have offered me a very intensive experience of peer review and are a great source of technical information. One must always, of course, check everything one reads there against the official documents promulgated by organizations such as the World Wide Web Consortium.

Jump back to table of contents.

Prose Index Page


Copyright © by the author, Amittai Aviram • amittai.aviram@gmail.comwww.amittai.com
Version: Saturday, 6 April 2013 11:35 PM EDT Valid XHTML 1.0! Valid CSS!