ORIGINS
The Web was born in a laboratory in Switzerland in the early 1990's, developed by some scientists seeking an easy way to store and rapidly distribute research documents around the world. They envisioned a system of computers that could be used to store reseach documents that could be quickly and easily cross-referenced by placing "hotlinks" within the text. Programs could then be written with the capability to retrieve and display the documents and allow the reader to click on the hotlinks and quickly retrieve related documents. The documents were called "hypertext" for their ability to retrieve text from far away quickly. The web was a quantum leap beyond earlier document retrieval systems that only allowed documents to be accessed by selection from simple menus based on a storage hierachy. But, the Web is simply one of many services available on the Internet. Certainly, it is a vast and popular one, but just one.
HOW WEB PAGES WORK
Web pages are not written - they are constructed. The visual object that the public calls a "web page" is actually manufactured by each web browsing program (such as Microsoft Internet Explorer® or Netscape® Navigator) each time the page is displayed. A browser starts its work by interpreting a command that is entered by the user or hidden in another web page. This command is called a URL (Uniform Resource Locator). The browser uses this instruction to locate the file that you want displayed and then retrieves the file from its storage place (perhaps on a distant web server, or maybe from a diskette in your PC). The browser must then try to recognize what type of data it has received in order to determine how to handle it. As a multimedia program, a browser is capable of handling many different forms of data including text, graphics, audio and video (just to name a few). However, no browser is capable of displaying all forms of data. In fact, the ability of browsers in that regard is really quite limited. Computers use thousands of different coded languages to represent the myriad of data forms they process. Even an ordinary PC may use hundreds of different languages during its monthly operation. Each form of data (text, graphics, etc.) typically has many different languages that could be used to represent it in computer storage. For example, most PC's are programmed to recognize as many as ten different languages just for storing graphic data such as icons and pictures. The specific language used to store a data file is usually indicated by the extension (suffix) appended to the end of the file's name when it is stored. For example, the extension GIF indicates that a file is stored in the popular "Graphics Information Format" language. Most browsers are written to recognize this language and would be able to display (or "render") the data without any assistance from another program. For this reason, the GIF language is referred to as a "native data format" for browsers. On the other hand, many architectural drawings are stored using "drawing" file format identified by the extension DWG. This is not a native data format to a web browser and so it cannot render these files without help from another program. Browsers can be enhanced to use additional programs called "viewers" (or "helper applications") to render some non-native data formats, but not all of them.
WEB CLIENTS (BROWSERS) VS. WEB SERVERS
All Internet software functions a one part of a pair of programs called clients and servers. A "client" is one of a pair of network programs that work in unison to retrieve data from a host and present it to a user. The client is the program that interacts with the user. Web clients are called browsers. A "server" is the other of the pair of network programs that work in unison to retrieve data from a host and present it to a user. The server is the program that runs on the target host and responds to requests by "client" programs for data. Although web pages are usually stored on servers, browsers render them; and each browser has different abilities and may result in a slightly different web page being rendered.
Hardware will also affect how a browser can render a web page. For example, many computer screens have a "resolution" (dot density) of 800 pixels (dots of light) across by 600 pixels high. Other screens have a higher resolution of 1280 by 1024 pixels. Each image has a fixed size in pixels. Thus, an image that is 400 pixels wide will use half of the page width on an 800x600 screen as opposed to less than a third of the width of a 1280x1024 screen. If the reader's screen has a smaller 640x480 resolution, then the image will take up more of the screen and may also affect the layout of surrounding text. Although the author of a web page defines its content and layout, the browser and the hardware that it is running on have full control over how the page will be rendered including such details as font and window size. It is wise to view your web pages using more than one type of computer with more than one browser.
HTML - HYPERTEXT MARKUP LANGUAGE
The HyperText Markup Language (or HTML) is the primary language used to create documents for the World Wide Web. HTML is used to define the structure of a document and to a lesser degree its format or appearance. The language was designed to define documents in a simple, portable way that could be interpreted by any kind of computer system, regardless of size or manufacturer. HyperText is a system of text that is cross-referenced, usually by storing it in separate files in separate locations, sometimes on very distant machines. The text contains hidden embedded instructions known as "tags" that are used to enhance its appearance or to place a non-text object within it when it is displayed. The tags are not displayed by web browsers, but rather are interpreted by them as instructions about how the page should be displayed. Readers of web pages actually do not see the HTML code that is retrieved from web servers, but rather see their browser's interpretation of that language. The tags can be used to provide cross-referencing information known as "hyperlinks" to other documents or to indicate positions within the text where enhancements such as boldfacing or changes in character size should be applied to the text when it is being displayed by the browser. Note that most browsers provide some menu choice that will display the true contents of an HTML file (including the tags) if desired. Read the accompanying handout entitled "HTML Examples" for illustrations of HTML code and resulting web pages that would be rendered by most browsers.
HTML files are indicated on the web by the extension html (or htm) that is appended to the end of their filenames. Multimedia objects such as pictures are not actually stored inside of HTML files, but are referenced in them using tags that contain information called "hypertext references". Web browsers interpret these references to determine the location of and then retrieve multimedia objects and finally include them with the text in the file to create the resulting web page. Thus, web pages are not stored as a whole; rather they are rendered (visually constructed) each time they are viewed.
RENDERING OF HTML BY BROWSERS & VIEWERS
Web pages containing multimedia objects may require additional hardware to be rendered depending on how many different forms of data (media) you plan to use and how many your system can currently manipulate. Additional programs called "viewers" may be needed to display or playback some forms of data such as non-standard graphics, music or movies. Each browser has a limited ability to render data. Some data formats will be "native" to a browser, other will not. Non-native data formats are not an insurmountable problem though. Additional software (often free) can be used to view them. Any program that help to render data is called a "viewer" regardless of the form of the data (even programs that play audio data are called viewers). Viewers that can be used to render or manipulate multimedia data independently from web browsers are called "helper applications". Many software companies have developed program modules (or "applets") that can be added to their browsers to increase their data literacy. Such modules are called "add-ins" or "plug-ins". They are not full stand-alone programs and cannot run without their parent program.
WWW PROTOCOLS & URL'S
A "protocol" is a standardized set of rules under which programs are developed to promote uniformity of a network service or resource such as e-mail or the World Wide Web. For example, all web software is written to conform to a protocol named HTTP or HyperText Transfer Protocol. An older protocol known as FTP (or "File Transfer Protocol") has long been used to transfer files between computers on the Internet regardless of their type or the brand of software being used. FTP clients allow users to upload and download files to and from web servers and other computers. E-mail is transmitted and routed following protocols known as SMTP (Simple Message Transfer Protocol) and POP (Post Office Protocol). Another older protocol known as "telnet" allows users to remotely connect to and control distant machines. These are just a few of the many protocols that allow a wide variety of services to take place on the Internet. The World Wide Web is just one part of those Internet services.
Web software is relatively new to the Internet. Its authors are fully aware of all of the earlier protocols. For this reason, web browsers are written to use multiple protocols and are able to interact with servers other than just web servers. Most modern web browsers can:
* - You must have access to the server and configure the web browser to know its address before e-mail or news hierarchies can be accessed. Most browsers recognize the special "mailto" instruction (see below) and thus can post articles back to the News server if you have setup the browser to know the address of an Internet mail server.
Most web browsers allow you to directly retrieve any of the resources above by entering a command called a "Uniform Resource Locator" (URL). Each highlighted "anchor" (often underlined and displayed in a different color such as blue) in a web page relates to one of these special linking instructions. There are currently seven standard protocols used in URL's, although some newer clients will recognize more. Each one designates the type of Internet resource being used. A table showing the syntax for each appears below, followed by examples of the use of each one.
| Protocol | Uniform Resource Locator (URL) Syntax |
|---|---|
| Hypermedia | http://hostname:port#/directory_path/document_name.html |
| Secured HTTP | https://hostname:port#/directory_path/filename.html |
| Gopher Item | gopher://hostname:port#/directory_path/menu_selector |
| Remote FTP | ftp://hostname:port#/directory_path/filename |
| Local File | file:///directory_path/document_name |
| -or- | directory_path/document_name.html |
| Remote Login | telnet://hostname:port# |
| Send E-mail | mailto:email_address |
| Newsgroup | news:newsgroup.hierarchy.name |
URL EXAMPLES
Note that the use of upper and lowercase in the commands below may be critical. Some web servers are case sensitive; others are not. It is best practice to always type the URL just as you saw it written.
Remote Web Pages (Hypermedia):
If you want to retrieve a hypermedia document named my_file.html in a directory named Documents from a host named www.anynet.net , enter the URL:
http://www.anynet.net/Documents/my_file.html
Gopher Items:
If you want to retrieve a Gopher item named "Current Projects" on a menu named "Business" from a host named gopher.anynet.net, enter the URL:
gopher://gopher.anynet.net:70/Business/Current%20Projects
Note the use of the hexadecimal (Base 16) code %20 in place of the blank space in the URL above.
Telnet - Remote Logins:
If you want to remotely login to port number 3000 on a host named main.anynet.net , enter the URL:
telnet://main.anynet.net:3000
Newsgroup Articles:
If you want to view the newsgroup hierarchy rec.sport.tennis , enter the URL:
news:rec.sport.tennis
FTP Remote File Retrievals:
If you want to retrieve a file named "readme.txt" in a directory named Documents from a host named ftp.anynet.net on which you do not have an account, enter the URL:
ftp://ftp.anynet.net/Documents/readme.txt
If you want to retrieve a file named "readme.txt" in a directory named Documents from a host named ftp.anynet.net on which you have an account named myacct and a password of pword, enter the URL:
ftp://myacct:pword@ftp.anynet.net/Documents/readme.txt
"Absolute Reference" to a Local File:
If you want to retrieve a hypermedia document named my_file.html from a specific directory named Documents on Drive C: of a local computer, enter the URL:
file:///C|/Documents/my_file.html
Notice the required use of the vertical bar ( | ) in place of the colon following the drive letter.
"Relative Reference" to a Local File:
If you want to retrieve a hypermedia document named my_file.html from the same directory as the last document, enter the URL:
my_file.html
Notice the lack of any protocol specifier (such as http:) in front of the command.
E-mail from a Browser:
If your browser can send e-mail, you could send a message to a user named smithj on a host named ircc.net with the URL:
mailto:smithj@ircc.net
WEB PAGE STORAGE & HOME PAGES
HTML files and the other multimedia objects that are used to construct a web page must be stored on a device that is accessible to the browser program. This device is typically a magnetic disk that is part of a PC or a web server attached to the Internet. A "web site" is a collection of HTML files and related data objects (such as images, movies, or programs) that are meant to be used as a group. Such sites often involve the use of multiple files and storage folders (also known as "directories") that are organized in a storage hierarchy known as a "directory tree". The folder that is used as the starting point for the group of files is called the "parent folder". Subordinate folders are referred to as "children" of that parent.
One or more web sites are typically stored on a dedicated web server, but a web site can be stored on a simple PC and viewed using a web browser located on the same machine. Rather than allowing users to view the entire contents of a folder at will, an author often creates a special HTML file called an "index" file in the folder. This file serves as a table of contents or index to the specific files that the author wants to offer to the reader. Most web servers will not send readers a directory listing of a folder's contents, but rather send its index page whenever a users provides a URL that stops with the folder's name. Some servers will simply return an error message if the user types an incomplete URL. This gives the author control over exactly what files can be viewed.
The index file is often referred to a folder's "home page", but this term actually has many meanings. In the most general sense, a home page is a web page that is meant to be viewed as a starting point when viewing a web site. The term is used in three different senses:
The home page of the public IRSC web server is retrieved using the URL:
http://www.irsc.edu/The home page of the faculty IRSC web server is retrieved using the URL:
http://faculty.irsc.edu/Notice that most sites name the machine that acts as their web server "www", although this is not a rule. For more information about using a web browser, see the web
page about How to find your way back to valued locations on the Web.
For information on Web Page Authoring, read the Web Page Authoring Home Page at
Return to the Internet Fundamentals Page| Last Revised: 27 September 2010 | © 2010 Randolph Gibson |
| www.gibsonr.com/classes/internet/wwwnotes.html | E-mail: rgibson@irsc.edu |