MAKE YOUR PAGES EASY TO FIND

By Scott Fields

In this Workshop, I revisit a topic that was last discussed almost a year ago, but which continues to rank high on the list of things new Web authors want to know: how to get your site listed with search engines. I'm not going to discuss strategies for improving your ranking, because that topic varies greatly depending on which search engine you're working with and how ethical you intend to be. However, I'll show you some very simple ways to ensure that, when your site is listed with a search engine, it is portrayed both accurately and informatively.

I've also expanded this Workshop from its previous incarnation, adding more links to popular search engines and including information on how to get your site listed only with the search engines you choose and how to keep your site OFF of certain lists.


EEEEK! A SPIDER!

If you've never seen a spider on the World Wide Web, don't be surprised. These creatures slink around behind the scenes, working day and night to fetch data for their masters, the search engines. You don't need to fear these spiders. They're friendly; indeed, they're downright necessary if you want your Web pages to appear in many of the most popular search engines.

Spiders are nothing more then software agents (sometimes also called robots) that crawl the Web and send data back to a search engine. The search engine then takes that data and uses it to create an index. When you use a search engine to look for Web sites, you are actually searching that index, not the Web itself.

In this Workshop, I discuss some of the ways you can make your Web site more friendly to spiders. Doing so helps your site get listed on some search engines, and it enables you to provide better information to users who view your listing on those search sites.


The difference between search engines and directories

Before getting into the details of how you can prime your Web site for indexing by search engines, you need to know how search engines work. This section provides only a general overview; each search engine uses its own set of rules and guidelines for indexing pages. If you want to research a specific search engine, check out the links at the end of this Workshop.

The first thing you need to know is that the term "search engine" is often improperly used to refer to directories. The primary difference between search engines and directories is that directories are compiled by people, while search engines rely on software to do all the work. By far the most popular and successful directory is Yahoo!, which has compiled a huge database of Web sites and arranged them according to category.

Typically, directories are more discretionary than search engines, and while they don't include as many sites, they generally provide links to the better sites. When you search the Yahoo! directory for "comic books," for example, you get a list of a handful of comic book-related categories such as the following:

  • Entertainment: Comics and Animation: Comic Books
  • News and Media: Television: Shows: Science Fiction, Fantasy, and Horror: X-Files, The: Products: Comic Books

The search also returns a hundreds sites--from other categories--that contain the phrase "comic books."

In contrast, a search engine uses spider software to crawl the Web and return data about all the pages it finds. This data is then used to compile a huge index, which in turn is searched by visitors. Because the search engine index indiscriminately reflects the contents of a wide range of pages, you usually receive many more matches when you perform a search. For example, Alta Vista (a search engine) recently returned more than 100,000 matches when I searched for the term "comic books."

If you want to get your site listed someplace where a lot of people can quickly locate you, directories might sound like the way to go--but there's a catch. Where directories rely on humans to classify each site, they also count on site creators to submit pages for listing. As the author of a Web page, you typically submit a short description of your site and request inclusion in the appropriate category. Then you wait, and wait, and wait, and hope that you'll eventually show up in the list. What's more, each directory has its own criteria for which sites it lists and in which categories, so there's a chance that your site won't make the cut.

To get your site included in a search engine index, on the other hand, you usually need only to submit the URL for your primary page and then sit back and wait for that site's spider to come and crawl your site. However, before you submit the URL, you need to prepare your pages to help the spider return accurate information and to ensure that the search engine lists your pages in the best possible manner.


HOW SEARCH ENGINES WORK

Search engines are built from three basic parts, starting with the spider software I've already introduced. When a spider first visits your page, it reads that page's contents and then follows the links it finds to read the other pages on your site. Some spiders also follow the links from your site to other sites. Typically, the spider returns to the site on a regular basis--about every month or two--to look for any changes.

The second part of the search engine is its index. Upon visiting your site, the spider sends the contents, or at least a portion thereof, back to the search engine's index. The index (sometimes called a catalog) is a giant database that contains a copy of every page the spider has crawled. When the spider returns to your updated site, the index is updated to reflect those changes.

The index for the typical search engine is huge, with thousands of pages being added every day. As you can imagine, it usually takes a while for new pages (or changes) to appear in the index after they've been crawled by the spider. What's more, until a page gets indexed, it isn't available to the third part of the search engine: the software. The search engine software is the program that sifts through the millions of entries in the index, locates matches, and sorts and displays those matches at the search site.

Again, you can find more information about how a specific search engine's software sorts and ranks the sites in its index by visiting the links at the end of this Workshop. For the most part, however, the general rules explained in the following sections should ensure accurate placement and ranking of your site.


LOCATION, LOCATION, LOCATION . . . AND FREQUENCY

We've all heard the adage that the three most important factors for a good business are location, location, and location. When it comes to getting your site listed with a search engine, that adage still rings true--but in this case, you important factors are location, location, location, and frequency.

The folks who use search engines tend to search for a word or phrase, called a "keyword." Each search engine follows its own set of rules concerning the location and frequency of keywords on a Web page, but in general the engines look for four things:

  • Keywords in the page's title
  • Keywords near the top of the page
  • Keywords in the <META> tags
  • Frequency of keywords throughout the page


Judging pages by their titles

When you go to a library or bookstore to look for a book about a particular topic, chances are you start by looking through the book titles. Similarly, when a search engine scans its index, it generally gives a lot of weight to titles containing the keywords it's looking for. Therefore, to help ensure proper placement of your page within the index, be sure to put an accurate title on each page. Even better, make sure your title contains the keywords that most appropriately reflect the contents of your page.

For example, suppose your page is devoted to nightware for felines, and you want to make sure your site is retrieved whenever a person searches for "cat pajamas." The first thing you would do is include "cat pajamas" in the title of the main page, as I did in Code Snippet #1.

Note that the title is included within the head of the HTML document, so both the opening <TITLE> and closing </TITLE> tags--and their contents--must appear within the opening <HEAD> and closing </HEAD> tags. Also, the text you put within the <TITLE> tag appears in the title bar across the top of the window when a visitor views your page in a browser. Don't make your title so long that it doesn't all fit within the width of the typical browser window. (Check out the results of Code Snippet #1 to see the text in your browser's title bar.)

  CODE SNIPPET # 1
--Begin copy and paste here--

<HTML>
<HEAD>
<TITLE>Cat Pajamas: We've Got Kitty Covered</TITLE>
</HEAD>
<BODY>
Insert the body of your document here...
</BODY>
</HTML>

-- End copy and paste here --

(To view the results of this code, paste it into a text file or HTML editor, save it with the .html extension, and then view it in your Web browser. You can view the results here.)

Some Web authors litter a page title with nothing but a series of keywords (or even the same keyword multiple times) in an effort to attain a higher ranking. Depending on the specific spider software and the emphasis placed on a document's title, the effectiveness of this technique varies. However, the one thing you can be sure of is that when visitors who load your page will see your keywords listed across the window's title bar. Obviously, this isn't the best-looking option, and it doesn't help your visitors navigate your site, either.


Starting off on the right foot

Continuing the metaphor of searching for a book at the library or bookstore, imagine that you find one with an appropriate and descriptive title that relates to the topic you seek. Chances are, the next thing you'll do is scan the table of contents and introduction to see whether it suits your needs, right? Well, once again, the search engine is likely to behave in a similar manner, scanning a Web page to see whether the keywords appear near the top of the document.

In some cases, the search engine gives even more weight to keywords that appear in a headline, so if it suits your page layout, including a descriptive headline at the top of your page doesn't hurt (this is what I did in Code Snippet #2). At the very least, include the appropriate keywords in the introductory paragraph, for the sake of both the search engine and your visitors. (After all, human visitors also are likely to scan the top of your page for the appropriate information before they decide whether to stay or go.)

  CODE SNIPPET # 2
--Begin copy and paste here--

<HTML>
<HEAD>
<TITLE>Cat Pajamas: We've Got Kitty Covered
</TITLE>
</HEAD>
<BODY>
<H1>Cat Pajamas: Nightclothes for the discriminating
cat</H1>
<P>
Welcome to the Cat Pajamas Web site, where you'll find all the latest
in distinctive nightclothes for the cat in your life. Not only do we
offer same-day shipping on all products listed in this site, but we'll
set you up with links to all manner of feline-related pages.
</BODY>
</HTML>

-- End copy and paste here --

(To view the results of this code, paste it into a text file or HTML editor, save it with the .html extension, and then view it in your Web browser. You can view the results here.)


I never <META> keyword I couldn't use

The final place a search engine is likely to look for matching keywords is the one place your site's visitors aren't going to see them: in a <META> tag. A <META> tag is a secret little thing that hangs out in the head of your HTML document and provides information about that document--and the most useful information it can provide is a list of keywords. Unlike the title and other visible sections of your document, the <META> tag lets you simply list any and all keywords that are appropriate for your site. (Hint: It's a good idea to include popular misspellings of your keywords, as well as foreign spellings.)

Creating search engine entries with <META> tags is actually a two-part process. First, you create a keyword <META> tag using the following syntax, where you enter a list of keywords (each separated by a comma) after "CONTENT=":

<META NAME="keywords" CONTENT=>

Next, you create a description <META> tag using this syntax, where you enter a brief description after "CONTENT=":

<META NAME="description" CONTENT=>

For an example of how these <META> tags might look within a document, check out Code Snippet #3. As you can see, both of these tags need to be within the head of your HTML document. What you can't see (but what I'll tell you) is that the keyword <META> tag provides indexing information for search engines, and the description <META> tag tells the search software what to display when your page appears with the other results. Keep in mind that an accurate (but brief) description helps your page stand out among the others that contain similar keywords.

  CODE SNIPPET # 3
--Begin copy and paste here--

<HTML>
<HEAD>
<TITLE>Cat Pajamas: We've Got Kitty Covered</TITLE>

<META NAME="keywords" CONTENT="cat pajamas, cat's pajamas, cats'
pajamas, cat pajama, cat, cats, kitties, feline, felines, pajama,
pajamas, pj, pjs, pj's, pyjamas, jammies, nightclothes, nightgowns,
nighties, nightware, sleepwear">

<META NAME="description" CONTENT="Cat Pajamas:
Nightclothes for the discriminating cat.">

</HEAD>
<BODY>
<H1>Cat Pajamas: Nightclothes for the discriminating
cat</H1>
<P>
Welcome to the Cat Pajamas web site, where you'll find all the latest
in distinctive nightclothes for the cat in your life. Not only do we
offer same-day shipping on all products listed in this site, but we'll
set you up with links to all manner of feline-related pages.
</BODY>
</HTML>

-- End copy and paste here --

(To view the results of this code, paste it into a text file or HTML editor, save it with the .html extension, and then view it in your Web browser. You can view the results here.)

When you use <META> keywords, keep in mind that sneaky tricks--like loading your <META> tag with the same keyword over and over or including popular search phrases that don't apply to your page--often HINDER your search ranking. While "free sex" may be the undisputed king of most-used search terms, including it 200 times in your <META> tag isn't going to magically make your page appear in more searches. Indeed, seeding your <META> tag with multiple copies of the same keyword may alert the search engine software that you're trying to fool it, which in turn may get you bounced out of the index.

You should also use caution when adding trademarked or copyrighted keywords without good cause. Putting "Tommy Hilfiger" in the <META> tag for your Cat Pajamas page, for example, is more likely to bring a lawsuit than customers.


Come here often?

One final criterion affects your page's rank within most search engines: frequency. If a search engine finds that a keyword appears ten times on your page, but only once on another page, it will likely rank your page higher. However, just as search engines don't appreciate the intentional overloading of <META> keywords, they don't appreciate the intentional overloading of keywords elsewhere within your document. If the search engine or a visitor catches you repeating popular keywords gratuitously to boost your ranking, you may soon find that you can't get your page listed at all.


Let's work together

If nothing else, I hope that today's Workshop has helped you understand that there is no single, magical way to help boost your search engine ranking. In particular, don't think that <META> keywords are the panacea that will ensure top placement every time. No matter the subject of your page, your best hope for pulling in visitors is to create good, descriptive pages that are easy to read and easy to navigate. Only by combining useful information in a number of key places can you hope to provide a useful search result, and only by posting an attractive and informative site can you hope to capitalize on those results and attract visitors.


KEEPING THE CREEPY CRAWLIES OFF YOUR PAGE

At the end of this Workshop, I list links to some of the more popular search engines and their instructions for getting your site indexed. Although each search engine allows you to submit your page for indexing, it's important to note that your site may be indexed even if you don't submit it. In some cases, spiders may find your site by following a link from another site; in other cases, they may just happen upon your site randomly. In either event, if you don't want your site indexed, you have to stop the spiders before they enter.

Putting the kibosh on spider software is a simple matter, accomplished with a <META> tag similar to those previously discussed in this Workshop. In this case, however, you use the robot <META> tag, which looks like this:

<META NAME="robots" CONTENT=>

The robot <META> tag gives you a range of control over visiting spiders, which you invoke with one or more of the following <CONTENT> terms:

ALL
NONE
INDEX
NOINDEX
FOLLOW
NOFOLLOW

If your goal is to tell spider software to leave your page alone, you simply use the <CONTENT="none"> tag, like this:

<META NAME="robots" CONTENT="none">

As you can see in Code Snippet #4, this <META> tag again belongs between the opening <HEAD> and closing </HEAD> tags.

  CODE SNIPPET # 4
--Begin copy and paste here--

<HTML>
<HEAD>
<TITLE>Cat Pajamas: We've Got Kitty Covered</TITLE>

<META NAME="robots" CONTENT="none">

</HEAD>
<BODY>
<H1>Cat Pajamas: Nightclothes for the discriminating
cat</H1>
<P>
Welcome to my personal site, where you'll find pictures of my cats
wearing a wide style of nightclothes.</BODY>
</HTML>

-- End copy and paste here --

(To view the results of this code, paste it into a text file or HTML editor, save it with the .html extension, and then view it in your Web browser. You can view the results here.)

As you've likely guessed upon seeing the list of possible <CONTENT> terms, the robot <META> tag lets you do more than just scare away spiders:

  • ALL invites spiders to explore the page and all linked pages
  • INDEX invites spiders to explore the page (that's the default)
  • NOINDEX allows the subsidiary links to be explored, even though the page is not indexed
  • FOLLOW means that spiders are welcome to follow links from this page to others
  • NOFOLLOW allows the page to be indexed, but no links from the page are to be explored

To include more than one of these terms, separate them with commas. In most cases, though, you can define your intent with just one term. ALL is equivalent to INDEX, FOLLOW; NONE is equivalent to NOINDEX, NOFOLLOW.


LINKS YOU CAN USE

Formatting your Web page according to the guidelines discussed in this Workshop only prepares it for listing with the search engines. You still have to stop by each search site and submit your URL so the spiders know where to find your page--unless you want to take the chance that a spider will wander by or will find you from a link within another page.

As you visit search engine sites, be sure to check out the specific guidelines to ensure an accurate listing. Each site will have slightly different criteria that sets it apart from the others. To get you started, I've compiled a list of eight popular search engines, along with links to the pages that tell you how to submit your site. Good luck!

AltaVista
http://www.altavista.com/av/content/addurl.htm
Excite
http://www.excite.com/info/add_url
Google
http://www.google.com/addurl.html
HotBot
http://www.hotbot.com/addurl.asp
Infoseek
http://infoseek.go.com/AddUrl?pg=SubmitUrl.html
Lycos
http://www.lycos.com/addasite.html
Northern Light
http://www.northernlight.com/docs/regurl_help.html
Web Crawler
http://www.webcrawler.com/info/add_url/