The Internet

An overview of how the Internet is organized and how communications take place using it. Major Internet services are covered, including their purposes, their protocols, and software and conventions associated with them.

Intro to the Internet

Table of Contents

A Superficial History
What is the Internet?
Internet Services

FTP
Telnet
Email
Usenet News
BitNet
IRC, Talk, CU See Me
World Wide Web
FAQ's
Glossary


A Superficial History of the Internet

Year Development
1957

The launching of Sputnik satellite by the Russians spurred the creation of ARPA (Advanced Research Projects Agency of the United States Department of Defense).

1969

ARPA commissioned the creation of ARPANet, a packet-switching network to allow scientific colleagues and their support teams to share information and research results. Throughout the late 60's design papers were presented and in 1969 the first node was created at UCLA. The network soon expanded to include three more sites.

1971

Grew to 15 nodes (23 hosts).

1974

Telnet begun.

1981

BITNET.

1983

TCP/IP officially replaced older less powerful protocols. In the same year the network split into MILNET (military) and ARPAnet (research and education).

1988

The Internet worm brought the Internet to a virtual standstill.

1991

World Wide Web released by CERN.

1993

The introduction of the Mosaic graphical browser spurred intense general interest in the Internet and Spiders and Crawlers begin roaming the Internet.

1994

Introduction of Internet service providers and the commercialization of the Internet led to the first Internet shopping malls. In the same year WWW takes over from Telnet as the second most popular service on the Internet.

1995

WWW is the service with the greatest amount of traffic on the Internet. See "World Wide Web Tops a Billion Pages"

What is the Internet?

The Internet is an interconnection of computer networks: a network of networks. The networks are connected by telephone lines, direct wires, fiber optics, satellite transmissions, etc.

Users at home access this network through the servers of their Internet Service Provider, or ISP. The speed at which information can be uploaded and downloaded depends upon a connection's bandwidth. Bandwidth is a measure of how much information can be sent or received in a given amount of time.

To enable computers to exchange information the Internet uses the TCP/IP protocol (Transmission Control Protocol/Internet Protocol). A protocol is a set of rules and methods by which Internet devices establish connections and transfer information.

To direct information to its proper destination the Internet relies on two address systems:

  • Internet Protocol Addresses
  • Domain Name System

Internet Protocol Addresses, or IP addresses, are the numeric addresses used by network machines to uniquely identify each other. IP addresses are sometimes referred to as dot-quad addresses because they are composed of four sets of three digits, each separated by a dot (period). Each quad number must be between 0 and 255. Example: 106.110.155.24.

Although machines use them in extremely efficient ways, IP addresses are quite unwieldy for people, and therefore we have a parallel address system called the Domain Name System (DNS). DNS is a hierarchic address system which uses descriptive words to represent the IP addresses of web servers.

In a DNS address, the right-most word represents it's top-level domain. A top-level domain represents either a geographical area (.ca, .uk, .af) or a type of organization (.com, .org, .net).

Some common top-level domain names:

Name Host type

.com

A commercial organization

.edu

An educational institution (generally a university)

.gov

A government (generally US gov) organization

.int

International organizations (inc. NATO)

.mil

A US military organization

.net

A network access provider

.org

Usually, a not-for-profit organization

Reading a DNS address from right to left, the name gets more specific until the name of the individual host computer is reached.

Example: http://www.jupiter.ccv.vsc.edu/

Reading from right to left, we see that this DNS address

  • is in the top level domain .edu, denoting an educational organization
  • is part of the Vermont State College system (vsc)
  • belongs to the Community College of Vermont system (ccv)
  • is on a CCV server named "jupiter"
  • refers to a World Wide Web server (www) running under the HTTP protocol

The combination of a DNS address and the name and location (path) of a document at that address is called a Uniform Resource Locator, or URL. The various parts of a URL are separated by forward slashes.

Example: http://www.jupiter.ccv.vsc.edu/help/default.htm

Reading from right to left we see that this resource

  • is a hypertext document, or web page (htm)
  • named "default"
  • located in a folder named "help"
  • on the "jupiter" server of CCV, part of the VSC, which an educational organisation

Notice that while backslashes (\) are used on PC's to separate parts of a path, URL's follow the UNIX convention of using forward slashes (/).

As mentioned above, we users use DNS addresses to navigate the Internet, while our computers use IP addresses. Translation between the two is done transparently by special Internet servers which maintain databases that provide the links between DNS addresses and their corresponding IP Addresses.

Under the TCP/IP protocol, files and other types of information are broken down into packets before being transferred from one location to another. Special computers called routers forward the individual packets until all reach their destination where they are reassembled into their original form.

Internet Services

To perform specialized tasks, the Internet uses many protocols in addition to TCP/IP. Just a few examples would include:

Internet services and their protocols

Service Protocol
File Transfer FTP
Telnet Telnet
Email SMTP/POP
UseNet News (Newsgroups) NNTP
BitNet (Mailing Lists)  
WAIS  
IRC IRC
World Wide Web HTTP

FTP

File Transfer Protocol, FTP, is the most efficient protocol for transferring (uploading and downloading) files across the Internet.

Before the advent of HTTP and the World Wide Web, FTP servers were the "information warehouses" of the Internet and researchers searched them for information using specialized tools such as:

  • Gopher
  • Archie
  • Jughead (Jonzy's Universal Gopher Hierarchy Excavation And Display)
  • Veronica (Very Easy Rodent-Oriented Net-Wide Index to Computerized Archives)

Although FTP sites are still used to distribute documents and software, the World Wide Web has replaced them as the major repository of information on the Internet and those pioneer search tools have been supplanted by WWW search engines.

FTP downloads are made using a dedicated FTP client or with a browser which is FTP-enabled. FTP clients are very compact and portable and they have many enhanced capabilities and are sometimes faster than browsers at transferring files.

FTP clients and FTP are still the chief method of uploading content to Web sites and performing file management on remote servers.

Telnet

Telnet is a text-based service which allows a user to establish a remote terminal connection with another computer and access its resources and services.

In the past, one of the most common uses for Telnet was to access the catalogue systems of university libraries. MUD's and MUSH's also utilize Telnet services.

While there are dedicated Telnet clients, many browsers can also utilize the Telnet protocol.

Ordinarily, connecting to a Telnet server (telnet://) requires a logon with a user account and a password.

Email

Email was one of the original Internet services. Email is transferred in a forward and store process using two separate protocols, POP and SMTP.

SMTP
SMTP (Simple Mail Transfer Protocol) is the protocol which handles the "long distance" transfer of email from one server to another.

POP
POP (Post Office Protocol) is the protocol used to transfer email stored on your mail server to your local machine where you can read it.

To access an ISP's email server you need to open an email account, sometimes called a POP-account.

Programs designed to send, receive, and manage email are called email clients. Netscape Navigator and Internet Explorer include email clients, thus making it possible to send and receive mail directly from a browser. However, these clients are considerably less powerful than dedicated clients like Eudora, Pine, and Pegasus, which provide such features as sophisticated filters, multiple signatures and mailboxes, and the ability to check and manage multiple mail accounts.

Beginning around 1998 a new type of email service emerged: Web-based email. The proprietary software used by such services sends and receives mail using the standard SMTP protocol, but replaces the POP protocol with a web link using the hypertext transfer protocol (http). Users can send and receive email anywhere they can access the World Wide Web and have no need for a email client, just a browser.

Such services are usually free to the user as revenue is generated by advertising on the website and in every email sent. While they are free and convenient, these services do not provide many of the more sophisticated features associated with dedicated clients and a POP account. Also, these services can change their terms of service at anytime they wish.

Examples of more or less popular webmail services are GMail (Google), Hotmail (Microsoft), and YaHoo Mail".

Many holders of personal POP accounts also obtain free, "throw-away" web-based accounts and use them in their online presence. Such "spam-magnet" accounts reduce the volume of spam (unsolicited email) received by personal accounts.

Netiquette Internet etiquette is a set of rules for acceptable behavior using email, and the Internet in general. Good Netiquette guidelines include:

  • Make it easy for the recipient to identify you.
  • Always include a subject field entry which concisely summarizes the message's content.
  • Respect copyright laws.
  • When forwarding mail, or re-posting a message, do not change the wording.
  • Never send chain letters.
  • Don't flame and don't respond to flames.
  • Be careful when addressing mail. Avoid sending mail meant for an individual to a group.
  • Don't send unsolicited mail requesting help to people unknown to you.
  • Verify all addresses before sending long or personal messages to them.
  • Be careful when using sarcasm.
  • Give yourself a waiting period before sending mail with very emotional content.
  • Be brief. When quoting original material in a response, include only the relevant parts.
  • Signatures should be no longer than four lines.
  • Acknowledge receipt of mail if appropriate, even if you will reply later.
  • Do not send large amounts of unsolicited mail to people.

Other important considerations include:

  • If using Internet access at work, check with employer about ownership of electronic mail.
  • Assume that mail is not secure: email is a post-card.

Usenet News

Usenet is a worldwide network of computers which, among other things, hosts what are called newsgroups. Usenet is like a huge bulletin board where people from all over the world can read and post messages. The "bulletin board" is divided into many topics, each representing an individual Newsgroup. Newsgroups are places where people with similar interests can exchange ideas and information.

A special program, called a news reader is needed to read and post Newsnet messages. Both the Netscape and Internet Explorer browsers include newsreaders, but dedicated programs, such as Forte's Free Agent, have more powerful features.

"Lurking" is the term applied to reading postings to a newsgroup without posting to the group. Lurking is a desirable activity: it acquaints one with a group's ground rules and etiquette. Reading a group regularly and reading the group's FAQ are recommended before actually posting a message to a group.

Newsgroup names are composed of several parts- comp.compilers for example. The top-level part (comp=computers, sci=science, rec=recreational sports, hobbies, arts, etc.) is the most general description and it becomes more specific from there.

The Usenet network is composed mostly of UNIX computers and it runs under the Network News Transfer Protocol (NNTP). The number of newsgroups hosted by Usenet has grown dramatically in recent years: from "several thousand" at the beginning of 1995 to more than 30,000 at the end of 1999, and over 50,000 by 2001.

See also "Frequently Asked Questions" and "Finding UseGroup FAQs below".

BitNet

BitNet (Because It's Time) brings us mailing lists whose primary purpose originally was to create communication links between academic communities. Archived information from these lists is a little difficult to get, but it is of very high quality (consider the sources) and a bonanza for researchers.

Created in 1981 at City University of New York (CUNY), it received a big push from IBM, which donated a main frame. It links thousands of universities all around the world. The IBM protocols used by BITNET are different from those used by the rest of the Internet, so the two must be connected by a gateway.

Unlike other Internet features, a BitNet file can be blocked by the failure of even one node. BITNET is not an interactive log-in the way FTP and Telnet are.

Mailing lists run automatically using specialized software over a network server. The three original list management programs were LISTSERV, LISTPROC, and MAJORDOMO.

List management software automatically adds and removes subscribers, receives and re-sends user posts, and provides searchable indexing.

Thus, mailing lists transform email, which is one-to-one communication, into one-to-many communication and preserves it as part of a body of knowledge.

Listees have an option to receive posts as they are received by the server, or having them sent once a day in digest form.

When replying to a mailing list be sure to send your message to the list, and not the list server. The server address is used only for subscribing and unsubscribing.

The functions of BitNet mailing lists have been largely taken over by web-based specialty user groups.

IRC, Talk, CU See Me

IRC (Internet Relay Chat) provides interactive verbal communication between multiple users in widely separated locations.

Like CB radio, IRC is spontaneous, not private, and has developed its own subculture (users use nicknames for example) and rules of etiquette. Unlike CB radio, participants need not be located within a local geographic area to communicate with each other.

To obtain FAQ's for IRC, go to:

http://www.funet.fi/~irc/
and
http://www.undernet.org/

To get the IRC Primer, go to:

ftp://nic.funet.fi/pub/unix/irc/docs/IRCprimer1.1.txt
ftp://cs-ftp.bu.edu/irc/support/IRCprimer1.1.txt
ftp://coombs.anu.edu.au/pub/irc/docs/IRCprimer1.1.txt

CU See Me, developed by Cornell University, is the transmission of audio and video across the Internet for the purpose of video-conferencing.

World Wide Web

The term hypertext was coined in 1960 by Ted Nelson to describe text that is not constrained to be sequential. Hypertext links surpass mere footnotes in their ability to supply additional information.

Palo Alto Research Center (PARC) introduced a lisp-based hypertext system and Apple bundled HyperCard with Macintoshes.In 1989 a CERN researcher proposed a hypertext system to enable efficient information sharing for members of the high-energy physics community. By 1990 it was running as a prototype and was made available on CERN machines in 1991.

Growing interest in the WWW was spurred in 1993 by the release of Mosaic, a graphical interface. In 1994 more browsers were announced, including Spry and Netscape Navigator.

The main protocol used by the World Wide Web service is hypertext transfer protocol, or HTTP.

Secure web pages use the HTTPS protocol which transfers information in an encrypted, or secure, format.

As that of the World Wide Web, HTTP is the most widely used protocol in the transfer information on the Internet.

Web pages are formatted using HTML, or hyper-text-markup-language codes. HTML codes tell a web browser how to display information from a web page. HTML pages either have an .HTM file extension (Windows servers), or one which is .HTML (Unix and Linux servers).

Although completely textual at its beginning, the World Wide Web has long since acquired multimedia capabilities. Browsers are equally adept at displaying text, still and animated graphics, and video, as well as to play audio files.

A web page's address is known as its uniform resource locator, or URL for short. A Web site's default landing page is known as its home page.

Note that web browsers can be customized to default to their own default home page when they are launched. See Browser Tips, Setting Your Homepage.

Web Browser

A web browser is the Internet client used to access and display the contents of World Wide Web documents which have been "marked up" with HTML formatting codes, or "tags".

Browsers are designed in modular fashion. "Plug-ins", now standard in all browsers, allow them to deliver audio and video content as well as provide access to other Internet services, including FTP, Telnet, and newsgroups.

When browsers cache visited web pages in their history, they copy them to the local machine's hard disk drive. This is designed to speed loading subsequently. Clicking on refresh or pressing F5 tells the browser to update by loading from the server and not from the cache.

Mosaic
The first Windows compatible GUI browser to achieve popularity.

Lynx
A text-based (non-GUI) browser. Very fast: use this one when you are in a hurry and not concerned with graphic information.

Opera
WYSIWYG browser; faster than Netscape and Explorer.

Foxfire
Graphical browser.

Netscape Communicator
Graphical browser; written by many of the software engineers who wrote Mosaic.

Chrome
Google's Graphical browser.

Microsoft Internet Exploder
Microsoft's Graphical browser.

Directory Sites

Directory sites are lists of other sites, organized by subject in a top-down arrangement. This makes them ideal for narrowing down a general topic to more and more specific sub-topics.

An example of a directory site would be Yahoo ("Yet Another Hierarchicly Odiferous Oracle"). Yahoo which lists over 80,000 sites in 14 top level categories.

http://www.yahoo.com/

Also try:

http://galaxy.einet.net
http://www.w3.org/hypertext/DataSources/bySubject/Overview.html http://nearnet.gnn.com/gnn/wic/index.html

To find other directory sites use a directory of directories, such as the Clearinghouse for Subject-Oriented Internet Resource Guides at the University of Michigan.

http://www/lib.umich.edu/chhome.html

Search Engines

Search engines are used to find answers to specific information. They search for your keywords and display a "results page" which lists Web pages that match your search criteria.

Search engines do not search the Internet directly, they search databases which have been created by software programs called crawlers, spiders, robots, or simply "bots". These programs independently roam the Net looking for new web sites and new web pages to index. When something is found new, or changed, they create a database entry of the URL and keywords from the contents.

Meta-search engines submit your keywords to multiple search engines and return the results from all of them at the same time. Many meta-search sites are now simply ad-farms designed to harvest clicks for pay.

See http://www.hermit.cc/mania/search.htm for a list of search engines and other World Wide Web resources.

To learn more about using Search Engines, check out U.C. Berkley's Finding Info on the Internet tutorial.

Download sites

Frequently Asked Questions (FAQs)
FAQs are a convenient method for distributing often requested information. They take the form of questions and answers and they are similar to leaflets and flyers but they are more accessible and up to date. They also cost much less to publish and can therefore serve smaller communities of interest.

True FAQs result from the collaboration of groups of individuals or organizations. They are public, they come from many contributors and go to many recipients, and they are authoritative. Their authoritativeness derives from having been reviewed and accepted by the community they serve.

All FAQs are copyrighted. They are legally the intellectual property of those who publish them. Copyright notices on individual FAQs vary from extremely liberal to very restrictive.

Types of FAQs

Newsgroup FAQs
Newsgroup FAQs offer information about a particular newsgroup. This information might include suggestions for appropriate topics, the format for postings, rules concerning commercial postings (usually not allowed on newsgroups), and newcomer questions and answers.

Topical FAQs
Topical FAQs present information on a specific topic. For example: an FAQ on meditation which answers such questions as What is meditation?, How does one meditate?, When does one meditate?

Business and Commercial FAQs
FAQs published by organizations or businesses. Examples: a midwifery organization publishes a FAQ on how to become a midwife; computer consultants publish solutions to common problems.

Finding NewsGroup FAQs

Watch the postings in any newsgroup in which you are interested for articles named FAQ or Periodic Posting. Also see the NewsGroups
news.announce.newusers and news.answers or any other
*.answers

FAQs in the News Archive at MIT
Use FTP to connect to the server

rtfm.mit.edu FAQs are in the pub/usenet directory. Use FTP to obtain the FAQ you want.

On the World Wide Web
Use browser to connect to:

http://www.faqs.org/usenet/usenet.html
(from Angela)

Email
Note: The rtfm.mit.edu mail server was turned off as of October 2009. The files it served may be accessed at:

ftp://penguin-lust.mit.edu/pub.

Glossary

Archie. Archie is a service that keeps track of the contents of most of the FTP sites on the Internet. Archie searches file titles for keywords and returns addresses for "hits".

FAQ. Frequently-Asked-Questions (and answers).

HTTP. Hypertext transfer protocol. Specifies the operations specific to the Web, such as hyperlinking.

IP. Internet Protocol is the network layer for the TCP/IP protocol suite. It is for packet-switching.

link. A link is a place in a hypermedia document that holds the information that identifies a place to jump to (a URL) in a different hypermedia document. Also known as a hotspot, or anchor.

NNTP. Network News Transfer Protocol used for Usenet news distribution.

POP. Post Office Protocol. POP server name is the server that provides your address and stores your mail for you to download to your personal computer.

Packet. A packet is a unit of data sent across a network. Large file transfers result in large numbers of packets being sent across a network.

PPP Point-to-Point Protocol. See SLIP.

Protocol. A technical description of the format for a message and the rules to be followed by two or more computers to follow as they exchange the message.

SLIP. Serial Line IP. Slip and PPP are the two protocols that allow home computer users to connect their computers to the Internet as peer hosts. They encapsulate TCP/IP packets for transmission over phone lines.

TCP/IP. Transmission Control Protocol/Internet Protocol. Two standards which work together to reliably send and receive blocks of data across the net. These protocols (sets of rules) for Internet communication between computers allow PC's, Apples, Mainframe, and Unix systems to freely exchange information.

URL A Uniform Resource Locator is a standardized description for the location of an Internet resource on the Internet. It consists of the access protocol, the host name, and the complete directory path of the file, separated by a forward slash.




Bibliography

Ginsburg, Mark and December, John. HTML 3.2 and CGI Unleashed. 1996. Sams Net, Indianapolis, IN.

McGregor, Pat. Mastering the Internet, 2nd Edition. 1996. Sybex, San Francisco.

Miller, Robert and Keeler, Melissa. Internet Direct. 1995. MIS: Press, NY, NY.

Rowland, Robin and Kinnaman, Dave. Researching on the Internet. 1995. Prima Publishing, Rockland, CA.

Stout, Rick. The World Wide Web, Complete Reference. 1996. Osborne, Berkeley, CA.




Bruce Miller, 2002, 2014