(You can download a .ZIP file containing an up-to-date version of these files)
The braver amongst you may decide you want to create your own web pages. You can start this very easily, and these notes will help get you started and give you some pointers. However the whole subject is vast, and if you intend becoming expert at authoring Web pages I suggest you familiarise yourself with the subject and then buy a suitably expert book on the subject.
Web pages are HTML documents. HTML stands for "HyperText Markup Language". That is, it is a language used to "mark-up" documents for display by a browser. One of the most important points to understand here is that each browser is free to implement or ignore a given mark-up as it sees best.
Thus a "header" may be shown as larger and bolder on a PC, but on a text terminal it might be shown in reverse video.
As an author of web pages it is crucial to remember that everyone is going to see you page slightly differently.
HTML pages consist of normal text with "tags" added for the markup. Tags are key words contained between angle brackets. Often tags come in pairs with some text between the two tags. In such cases the closing tag has a slash character (/) after the opening angle bracket. The text between the two tags is thus effectively "marked-up" by the two tags.
For example
Some of this <b>is in bold</b>
The end tag doesn't have to be on the same line, and generally browsers ignore the use of white space in the source document. Care should be taken to make sure that each tag has a matching end tag. Tag pairs can be nested, and it is good practice to place the end tags in reverse order that they are applied i.e.
<b>Bold and in <i>italics</i> </b>
instead of
<b>Bold and in <i>italics </b> </i>
You might get away with the second usage, but it's bad practice, and depends on the browser as to how it reacts.
The overall structure of a HTML document should be
<HTML>
<HEAD>
.
. (other tags that belong in the header) .
</HEAD>
<BODY>
.
. (other tags and text that belongs in the main body) .
</BODY>
</HTML>
The HTML standard is maintained by the W3 consortium. Visit The WWW consortium for up to date chapter and verse on what HTML is. You'll also find definitive lists of standard HTML tags there.
Here is a brief list of the most commonly used markups. A fuller list can be found at (amongst others) http://www.htmlgoodies.com/html_ref.html
These tags go in the <HEAD>...</HEAD> section of the document. The text marked up in this way becomes the document's title shown at the top of the window.
The <b>..</b> <i>..</i> and <u>..</u> markups produce bold, italics and underlining effects. Note, hyperlinks are underlined automatically.
Recently there has been a move away from the bold and italic tags to <strong>..</strong> and <em>..</em> markups. The former are known as physical markups since they describe physical characteristics. If a browser cannot do italics or bold, then those markups will be ignored.
These newer markups are called "logical" markups as the tell the browser the degree of emphasis wanted. This leaves the browser free to choose how to achieve this effect.
Browsers ignore the use of white space in a source document, and this includes line breaks. This is to allow paragraphs of text to adjust as the browser window is resized.
If you want a line break, the <BR> tag tells the browser to do just that.
The <P>..</p> markup is used to mark up paragraphs. It is quite common for the </p> to be omitted, however as more arguments are added to the <p> tag it may become important to supply a </P> tag to mark the end of the specified effect.
The <HR> tag puts a horizontal line across the page.
The <A> .. </A> tags can be used to define anchor points and hypertext links. These tags always require extra arguments in the opening tag to define the link.
This is discussed more fully in Adding hyperlinks to web pages
The <IMG> tag can be used to add pictures to your page. The basic definition is something like
<IMG ALT="text description" HEIGHT=size WIDTH=size SRC="URL">
The ALT attribute is a text description displayed whilst the image is loading. This helps to give the viewer an idea of what's coming before it arrives. It's a good idea to always include an ALT attribute for the following reasons
- It will be shown whilst the image is still loading
- It will be shown even when the user switched images off
- It can be understood by browsers used by the partially sighted
- In recent browsers it is shown as a "tooltip" whenever the mouse is moved over the image.
The HEIGHT and WIDTH attributes specify the display size of the image in either pixels or percentage of screen size. This allows the browser to reserve space whilst the image downloads, allowing the text to be displayed faster. It won't make the download faster, but it will seem faster. It will also preserve the page layout if you switch images off. You don't need to supply both.
If you don't do this, the browser either has to wait until it's got the image, or it has to completely redraw the page once the image arrives. Neither is particularly nice.
Note, the HEIGHT and WIDTH need not be the original size of the image, and people sometimes think, wrongly, that they can speed up the download by making the HEIGHT and WIDTH smaller. In fact the download is just as slow, but it gets drawn smaller. If you want a page with small pictures it's usual to make smaller copies and link those to the larger originals. These small pictures are often called "thumbnails".
You can get away with specifying just one of HEIGHT or WIDTH. The browser will set the other to scale.
The SRC attribute gives the URL where the image file can be found.
Only the SRC attribute is really needed, but supplying ALT, HEIGHT and WIDTH are good habits to get into.
Both Netscape and Microsoft have invented non-standard HTML tags to give their browsers added features. Some of these have subsequently been adopted into standard HTML. Generally it is a very bad idea to use these extensions as it means you are forcing your audience to use one browser over another.
In the early days all the Netscape extensions became the de facto standard. This situation is very unlikely to occur again.
Hyperlinks are added to web pages using the Anchor tags <A> .. </A>. There are two basic methods for using the anchor tag. One creates an "anchor point", that is a point that a hyperlink can jump to, and the other creates the hyperlink itself.
For example
<A NAME="AnchorPoint">This is an anchor point</A>
creates an anchor point called "AnchorPoint" in the current document. The text between the <A>...</A> tags will appear as normal.
By contrast the markup
<A HREF="#AnchorPoint">Goto the Anchor point</A>
creates the hyperlink that will take you to the first point. In this case anything between the <A> and </A> tag is highlighted, and may be selected to activate the link. This can include images.
The HREF part of this tag is in fact a URL. In this context a URL is fully specified as
<resource type>://<machine name>/<directory>/<filename>#<anchor point>
where
<resource type> normally "http" <machine name> The internet node the resource is on. If
omitted the current machine is assumed<directory> The directory path on the machine that the
resource file lives in. If omitted the
current directory is assumed.<filename> The file that is to be viewed. If omitted
the current file is assumed.<anchor point> The location within the file that the
browser is to go to. If omitted (or
if invalid) the top of the file is assumed.
Note:
In addition to adding images to your web pages, you can change the colours used on your web page by adding attributes to your <BODY> tag as follows;
<BODY BGCOLOR="110000" LINK="009900" VLINK="0000CC" ALINK="FFFFFF" TEXT="000000">
Where
BGCOLOR Is the background colour of the page TEXT Is the colour of the text LINK Is the colour of an unused link VLINK Is the colour of a visited link ALINK Is the colour of a link as you visit it
In each case the colour is specified as a set of three hexadecimal numbers that express the red, green and blue component of the colour.
In Hex digits can be 0..9,A..F, so for each colour you have a theoretical range of "00" to "FF" or 0-255 in real money.
If you're not familiar with hexadecimal, think of "F0" as a two digit number, in which case you'll see that the sequence on numbers goes
00, 01, 02... 09, 0A... 0F, 10, 11, ...1A...1F, 20.... ..... FF
On this scale "F0" is pretty high, whilst "0F" is pretty low, that is it's the first digit that is most significant, just as it is in base 10.
In the above case we have
BGCOLOR 12 00 00 = (18,0,0) i.e. a dark red. LINK 00 9A 00 = (0,154,0) i.e. a medium green VLINK 00 00 CD = (0,0,205) i.e. a bright blue ALINK FF FF FF = (255,255,255) i.e. brilliant white TEXT 00 00 00 = (0,0,0) i.e. dark black
Be careful not to have two colours the same, as this will make something go invisible.
There are plenty of colour palette's for you to use on the Net. For example visit http://www.concentric.net/~noree643/colors/contents.html
There are an increasing number of web editing tools around. These usually offer ease of use and better graphics handling an WYSIWYG functions.
However really simple web pages can be created with just a text editor and a small HTML reference book.
You pays your money and takes your choice.
The quickest way to test the layout of your pages is to view them straight from your own hard disk. This will save you upload time to your server machine, and can be done to a large extent off-line, disconnected from the Net.
To do this save your file to disk, and open a browser window. Instead of entering a web URL, instead select a "open file in browser" option. The location of this option will vary according to the browser you are running.
When you do this the location will be something like
file:///c|/directory/ (whatever)
instead of the usual http: address. If you are going to edit this frequently it will be a good idea to bookmark this location for future use.
View the page as normal in your browser. You should be able to check the layout and appearance of the page, but you may not be able to test some of the links unless you are connected to the Internet.
If necessary, go back to your editor and make any changes and save the file again.
You can now view the changed version of your file by selecting the reload or refresh option in your browser.
Note, you probably don't need to exit either your browser or your editor in doing this. This makes development a lot faster. In some cases the browser and editor are even part of the same software package.
Once you are happy you should upload your new file(s) to the server.
Note, when you upload your file you may find that some of your links that work on your own machine may not work when you've loaded the page onto the Internet. This is a really common fault, so make sure you always view your pages after uploading to the Internet, preferably from a different machine.
The usual cause is that you've forgotten to also copy the files referenced (e.g. image files), or that the relative links used are invalid on the Web (e.g. files in sub-directories on your machine are in the same diretory when loaded to the Web). It is always a good idea to organize your local directories to exactly match the target configuration on the web.
There are lots and lots and lots of on-line web pages dedicated to teaching people HTML. I'm not even going to start to suggest one.
Go to AltaVista and type something like
+learn +HTML +beginners
and pick one of the 5000+ sites you find. You can refine your search by adding more keywords.
Similarly there are lots and lots and lots of computer books. However this is more problematical as computer related books tend to be large and expensive.
HTML books come in several forms:
One problem with such books is that they lose their edge when the next version of HTML comes out.
My advice would be attempt to struggle through an on-line course, and learn by example. Depending on how easy or hard you find that, choose an appropriate book.
The best way to pick up tricks and learn a new (computer) language is to see how it's currently being used. Fortunately this is very easy in HTML as most browsers will have a View... Source... option, and some will have a save to disk option, allowing you to study the file at your leisure.
If you see a web page with a feature you want to understand, try just looking at the source.
Unfortunately as the language gets more and more sophisticated and more is done via HTML extensions this can be harder to do. Another fact working against you doing this is that more and more pages are written from HTML editing software, rather than "by hand". Such pages are harder to view sensibly because they use far too many features (special fonts etc), and create very long source lines.
You should be aware that if you see <APPLET> or <SCRIPT>..</SCRIPT> tags then some active content is being supplied by means other than just HTML, namely Java and Javascript.
It's perfectly possible to write incorrect HTML and not be aware of it. This is because each browser can choose to handle errors as it sees fit. Which means that a page that looks okay in your browser will fail, or look bad in someone else's.
To get round this problem is is a good idea to try to view your pages with as many different browsers and versions of browsers as you can.
Another good idea is to run your page(s) through a HTML syntax checker. There are a number of these available, and there are even some freely available on-line.
For example, visit the "Dr. HTML" site that will test your web pages for errors that might trip certain browsers up.
http://www2.imagiware.com/RxHTML/
The "horse's mouth" in standard HTML is the W3 consortium, e.g.
Another good guide to HTML is at
If you want to ensure your web pages are accessible to those who are disabled (i.e. the visually impaired or blind), there are some guidelines and an on-line checker at
If you see a web site that you don't like, work out why (view source if need be). A lot of liking or disliking web pages is a matter of personal taste, but for a tutorial in what can go wrong, visit http://www.webpagesthatsuck.com/
This offers a tutorial on web page design by counter-example, showing all the things that (in the author's view) you shouldn't do.
It's quite an interesting site, and makes several good points.
Things to look out for include
Learning more advanced HTML is beyond the scope of these notes. Here is a brief list of more advanced features supported by HTML that you may see referenced.
More recent versions of HTML support the use of tables. This allows the author some control over the actual layout of elements on the screen relative to each other.
One consequence of using tables is that the contents of the table won't usually be drawn until all the table data has been fetched (until then the table size may not be known).
This can make tables appear slower. Whether or not they really are much slower I couldn't say.
HTML supports the use of <FORM> and <INPUT> tags to allow the author to define data entry fields into which the user can type information.
By themselves they are useless, but the http protocol allows this data to be "submitted" to a URL, typically a CGI script on the server from which the web page was downloaded.
This was HTML's first step towards becoming interactive, and is the sort of technique used on all the search engine forms one uses.
HTML does not allow error checking on the data entered into these fields, so that it would be possible to submit invalid data to the server.
This is inefficient, and often you will find that a page uses JavaScript to check all arguments before forwarding the request.
Frames allows the browser to divide the screen into sections, with an HTML page in each section. A good use of frames is to place a navigation bar in a fixed frame, and use hyperlinks to update the contents of the rest of the screen.
There are many bad uses of frames.
One problem with frames is that the familiar "back" and "forward" functions in the browser can become ambiguous (which frame is to go back).
Because of this frames are not as popular as one might otherwise have expected.
CGI scripts are programs that reside on a web server. Usually these handle particular requests "submitted" from an HTML form. The normal practice is to execute some calculation and dynamically construct a HTML page that is sent back to the client browser as a response.
This is how search engines work. They receive your keywords as data entered into a form, use this to locate any entries in their database that match, and then construct a HTML page containing this data which is then sent back to the originator, often with an advert bundled in.
It's important to know your audience when composing a web page. If you don't cater for their needs and desires, no-one is going to view your pages and you're wasting your time.
People use different browsers, and different versions of browsers. Try not to write web pages that will only work for one. If you do, consider placing a warning on the top page, and offering a low-tech alternative (e.g. a no-frames version, or a non-Java version).
People have different size monitors with different resolutions. On top of this they resize their windows.
Bear this in mind, try viewing your page in different sized windows both very large and small.
Always make sure a page has enough links. Don't go mad, but imagine the user has gone straight to this page and ask yourself what links you want to give them from here.
It's no good saying "go back to my home page" if they haven't just come from there or "email me at the address on the previous page". Better is to provide a link back to the home page, and a "mailto" hyperlink.
A lot of people view pages with no graphics. Either because they have a text only browser like Lynx, or because they are on a slow, costly modem link.
View your page with graphics off. Make sure the page still makes sense, and that all the <IMG> tags have HEIGHT, WIDTH and ALT attributes. These are important even if the user has graphics on, as on a slow link they determine what the user sees whilst the images are still downloading.
If you use advanced features such as these, consider offering a low-tech equivalent or be prepared for people to turn their back on your page.
Unless you want to be a mirror site don't copy the same web page to many different locations. This is both inefficient and confusing, as the file gets updated in one place and not another.
Instead keep it in one place and link to it form the others. That's what the web's there for after all.
Keep revisiting your pages, and make sure they are kept up to date. People are not interested in pages that have clearly become out of date.
All web sites are always under construction. So don't make a big deal of it, it irritates after a while.
Normally Web pages are "published" by uploading them to your web server. The mechanics of this will depend on your site, and you'll need to contact you system administrator or ISP for advice.
Often a limited form of FTP may be involved.
Okay, you've produced a wonderful web page, and put it on the Net. Now what?
Try to link it into the Web. This usually means at least linking it into your own pages (if you already have some), and hopefully persuading others that the content is sufficiently useful that should link to it.
This isn't as daft as it sounds. In the same way that you should tell people your telephone number and email address, so too you might wish to tell them your web address. People often link to their friends web pages, and daft as it sounds, sometimes people follow those links.
At the very least, get someone you trust to view it and criticise it.
If you have a business card, put your details on that if its suitable.
Write it with your email address inside your christmas cards. Come January you may be surprised at the people who contact you.
If your page is likely to be of interest to a particular Internet community then announce it.
This can be done by posting an article to appropriate Usenet newsgroups. Most newsgroups are reasonably tolerant of such announcements, but be honest with yourself in deciding whether or not other people will be interested.
If your page is itself advertising some resource or product that you are making available, then there may be newsgroups dedicated to announcements of that type. You'll need to check out the hierarchy and locate a suitable ".announce" newsgroup. Such newsgroups are sometimes moderated and often don't welcome commercial posts, so it may be a good idea to check on the group's charter to check your announcement is allowed.
Publicize it any way you think is legitimate and suitable. If it's a commercial page, add a URL to your normal advertising. This is already common practise.
Add the URL it to the address portion of your stationary.
Of course the real way of getting your page found is to add your URL to the same search engines that you use yourself to locate pages.
Most search engines allow you to submit your URL for inclusion in their database. The normal sequence of events is
Most engines will tell you how long they expect "finally" will be. This can be from a few days to several weeks.
Once you have been indexed, most engines will do some or all of the following
This is important because in some cases (e.g. AltaVista) if the search engine doesn't find you page (e.g. because your server is down) it will remove you from the index.
In this way you can "drop out" of an index, so it's worth checking periodically to make sure you're still there.
This is useful, as it means you need only submit the top page for the whole site to be indexed.
However, the robot will only index your site if the robots policy file allows it to (this is for your site administrator to set up).
Furthermore, the robot often lags weeks behind the main indexing process, so your submitted URL may appear in the index long before the rest of your site does.
Note, some search engines read Usenet looking for URLs and start indexing and crawling over those sites. In this way you can sometimes be indexed without even having to submit your URL explicitly. This is another example of anything you post becoming knowledge in the public domain.
If you really want everyone to be able to find your page, you naturally want to submit it to as many search engines as possible. This is a service that can be bought commercially, though there are still a number of free services such as Submit-It (see http://www.submit-it.com/default.shtml ).
In these services you typically have to fill in a form, and then go through the process of choosing the search engines you wish to submit to. Many search engines are geographically based or topic based, so you will need to be careful only to submit to appropriate locations.
Unfortunately, you often have to categorize your URL for sites such as Yahoo, and each site has its own system. This can make the whole process time consuming and a bit daunting. This is presumably one of the reasons people charge for this service.
This will vary from one search engine to another but basically the your web page will be indexed by content in a number of ways. Things that may get recorded are
When a user searches for a page, the information collected above will be matched against the search request. Usually the search engine will attempt to rank all matches in order. In some cases the user can specify what weights they wish to give to the different factors (e.g. date etc).
Most search engines can find thousands of entries to match simple queries, but will only show the first 10-50 and then the next 10-50 etc.
Obviously, if your page isn't up there, no-ones going to select it. There are a number of tricks you can use to improve your rating.
Choosing the right keywords is a bit of an art. Try not to use inappropriate keywords as drawing people to your page for the wrong reasons will simply irritate them.
Keep the description below 20-odd words as this is all most search engines will show.
You can measure how well you rate by visiting
This searches the search engines to see what position (if any) your page comes in.
Once your page is up and running it'd be nice to see who visits it. Nice, but sadly not normally possible.
You can, of course, invite feedback on your pages by supplying email hyperlinks or a form to fill in and send. I'm not sure how useful this is. Anyone moved to feedback is either going to be overly positive or negative. You're not likely to get a measured critique of your pages, though you will get some constructive criticism.
Depending on who provides your Internet access, you may or may not be allowed to add a counter to your page. Counters are usually small graphics that are dynamically calculated by the web server. Typically your service provider will give you the URL to use as the image, and when displayed this will appear as a number.
Web counters are nice (at least they give you a warm glow if they go up regularly), but they have their downside. For a start they are images. On a text based page this can often mean the last and slowest thing to load is the counter, which is a little self-defeating.
Also you should be aware that your counters underestimate the number of visits, but includes your own visits. Undercounting occurs
If you visit a site with a very large counter, remember that these things are not always started at zero. Even if you go back the next day, it's still possible that the increase has been artificially inflated.
At best these things are only ever approximate, they are more use to you than to your visitors.
Most servers keep a log of all accesses to the site, though of necessity this information is discarded every so often.
If you are technically minded and have access to this information, you can collate all sorts of stats, and get lists of which Internet Nodes visited which web pages.
This can be quite interesting information, and often server managers will make this information available in the form of a statistical report.
However, you still need to bear in mind that
Still... it's still pretty interesting information for us nerds.
A number of monitoring services, free or otherwise, have started to appear. These usually entail adding some HTML to your page linking to their server. They then make reports to you on the hist. The comments made in 8.8.3 still apply, but if you don't have access to your server logs, or the time to analyse them this is something worth considering.
An example can be found at
© 1997-1999 John A Fotheringham and
JafSoft Limited Last Minor Update : 4 December '99 |