TechVideos
  Home     About           Free Videos           FAQ     Contact  
Stay Informed Tell A Friend Bookmark this site
Copyright 2000

The Importance of Intranet Search Engines and Techniques to Improve Search Results

ABSTRACT

Search engines have become extremely important tools whose accuracy and scope are critically important to users. A Web site is useless as an information resource if users cannot find the information that they are looking for quickly and efficiently. To help improve finding information on the Web, this paper will examine 1) the importance of Intranet search engines, 2) how to prepare pages for searching, and 3) how to build more powerful searches.

INTRODUCTION

With the wide use of the Web today, Intranet and Internet search engines have become extremely important tools whose accuracy and scope are depended upon by users. No matter how well sites are designed and organized, finding information would be much more tedious and time consuming if search engines did not exist. These tools allow users to type keywords into a form, press a search button, and get a list of documents that match the keywords. As simple as this process may sound, many relevant documents may go unexamined because of incomplete indexing of documents, poorly formatted documents, and unspecific queries. To help improve finding information on the Web, this paper will examine 1) the importance of Intranet search engines, 2) how to prepare pages for searching, and 3) how to build more powerful searches.

THE IMPORTANCE OF INTRANET SEARCH ENGINES

Intranet Web sites are a great way to disseminate information both internally and externally. However, a resourceful Web site is useless if users cannot find the information that they are looking for– resulting in lost time and lost opportunities. If users cannot find what they are looking, they will probably move on to another site. Even Web sites that only contain a few pages should be searchable. Assuming that a site is searchable because it has been submitted and indexed by a major Internet search engine (e.g., AltaVista, HotBot, and Lycos) can be a costly mistake. Due to the large number of documents on the Web in comparison to the small number of documents these external search engines have indexed, it is unlikely that a corporate Web site has been completely indexed by one of these major search engines. For example, AltaVista which claims to be the most comprehensive search engine has 140 million page indexed ("About AltaVista," 1999). However, the Web is currently estimated to contain over 1 billion documents ("Important Things to Know," 1999). This implies that approximately 86% of all documents on the Web cannot be located directly using AltaVista. Similarly, HotBot has indexed 110 million pages on the Web (approximately 11%) leaving 89% unindexed ("Wired Digital", 1999). To exemplify the incomplete indexing of documents, HotBot's Check URL application (http://www.hotbot.com/help/checkurl.asp) was performed on a sample site. The Check URL application verifies if a URL has been indexed in HotBot's database. The site chosen for testing was the Web site for the College of Business and Industry at Northeastern State University (http://arapaho.nsuok.edu/~cbi). HotBot's Check URL application returned information indicating that only 4 documents at this site have been indexed. With 14 cross-linked documents available at this highly visible site, HotBot is currently only indexing 29% of the available documents. While this indexing percentage is much higher than the average Web site, it illustrates that time might be wasted and opportunities lost if the College of Business and Industry relied exclusively on external search engines such as HotBot to completely index their Web site.

Based on these numbers, it is obvious that an Intranet search engine must be incorporated into a Web site to ensure complete indexing. However, before installing an Intranet search engine, you should carefully consider 1) the types of queries that will be performed and the types of results that you want returned, 2) the ease of maintenance and installation, and 3) the price.

Queries/Results

The queries that can be submitted to Intranet search engines and the results returned by them can vary drastically. Below are some important features that should be considered. Good search engines should have the ability to:

Maintenance and Installation

The tools for maintaining and installing an Intranet search engine vary drastically from one search engine to the next. The commercial versions typically are installed and maintained using graphical user interfaces (GUI's). Whereas, non-commercial versions of search engines are typically maintained by editing configuration files.

Price

Below is a cost comparison of four Intranet search engines. The price for these tools can range from free to very expensive. As shown in this table, the price for commercially available search engines is typically based on the number of documents that can be indexed.

Search Tool Num. of documents
that can be indexed
Cost
WebGlimpse
  http://glimpse.cs.arizona.edu/webglimpse
Unlimited Free
Harvest
  http://harvest.transarc.com
Unlimited Free
AltaVista Search Intranet
  http://altavista.software.digital.com
3,000
100,000
1,000,000
Free
$29,995
$99,995
InfoSeek UltraSeek Intranet Server
  http://www.ultraseek.com
10,000
>10,000
$4,995
Contact their sales group

PREPARING PAGES FOR SEARCHING

When formatting or editing pages for the Web, search results should be kept in mind. A properly formatted document will result in better indexing, and consequently, better searches will be able to be performed on those pages. In particular, this section explains how search engines index information in specific HTML tags. By knowing this information, you will be able to edit and create more searchable documents.

Page Titles

Most search engines rank the information in the title of a document higher than the information found in the body of a document. Hence, it is important that a descriptive title be provided for each document. The title should provide a little context as well as a specific topic for the document. For example, if this document were placed on the Web, either one of the following titles would be accurate. However, the longer title tells the user exactly what to expect when this document is viewed.
<title>The Importance of Intranet Search Engines 
and Techniques to Improve Search Results</title>

Or 

<title>Search Engines</title>

Meta Descriptions and Keywords

Many search engines display the meta description as part of the results page. If a meta description is not provided, many search engines typically display the first 50-70 words of a document. Unfortunately, the header of a document might only contain information about the author. By providing a meta description, a search engine will more likely display a summary of the document rather than simply the first few sentences. Below is an example of a meta description for this document.
<META NAME="decription" CONTENT="Description of the importance 
of Intranet search engines, preparing pages for searching, and how 
to build more powerful searches.">
Meta Keywords are also an important part of a Web page. A good set of keywords should cover the topics mentioned in the document. Below is an example of meta keywords for this document.
<META NAME="keywords" CONTENT="Intranet, search engines, 
Web, Internet, indexing, retrieval, Boolean operators">

Headings

Many search engines also use headings (i.e., <h1> through <h6>) to rank the relevance of a document for a particular query. They assume that words in headings are more important than the words in text. Hence, when possible, place main concepts and ideas into HTML heading tags.

Register your URL with the Search Engines and Directories

The final step in preparing pages for searching is to plan and implement an awareness building campaign for your Web site. This should include at a minimum submitting your URL to the major search engines and directories for indexing. One easy approach is to use a commercial service such as Submit-It (http://www.submit- it.com).

HOW TO BUILD MORE POWERFUL SEARCHES

Despite differences in search engines, they have many searching characteristics in common that can be used to build more powerful searches. Below are three general search tips that will result in more relevant documents being returned.
  1. Perform "phrase searching." Sometimes the order of the search terms matters. By using phrase searching, you can greatly eliminate the number of documents that matches a search query. For example, if you phrase searched for "The Golden Gate Bridge," you would get a list of documents that contain all four words in that order.

  2. Use specific keywords as opposed to general ones. For example "Purple Martins" will return much more specific results than "birds."

  3. Incorporate Boolean operators into your search. Boolean operators allow logical thought to be expressed as algebra. Below is a list of Boolean operators and other search features that will help produce more powerful search expressions.

AND

Joining search terms with the AND operator tells the search engine that only documents containing all the terms should be returned. For example (heart AND transplant) finds documents with both the word heart and the word transplant. Note: On some search engines, a plus sign (+) can be used to indicate an AND operation.

OR

Joining search terms with the OR operator tells the search engine that documents containing any of the terms or phrases should be returned. For example (nearsighted OR myopic) finds documents containing either the word nearsighted or the word myopic. The returned documents could contain both of the keywords or just one.

NOT

The NOT operator excludes unwanted documents containing the specified terms or phrases. For example (heart AND attack NOT transplant) would find documents on heart attacks, but would not return documents on heart transplants. On some search engines, a minus sign (-) can be used to indicate a NOT operation. When using AltaVista the NOT operator cannot stand-alone. It must be used in conjunction with another operator like OR/AND. If using AltaVista, the query above would be phrased: (heart AND attack AND NOT transplant).

Wild Cards/Word Stemming

When used at the end of a word, the asterisk (*) functions like a wild card. It broadens a search to include extensions and plurals of the word. For example: consult* would match consults, consultant, consulted, and consulting.

NEAR

The NEAR operator finds documents containing both specified keywords that are near to each other. For example (constitution NEAR "United States") would find documents containing the phrase "the Constitution of the United States" or the "United States Constitution". When using the NEAR operator in Lycos, the words must appear within 25 words of each other in the results documents. However when using the NEAR operator in AltaVista, the words must appear within 10 words of each other.

Parentheses

Parentheses can be used to ensure that the operators are evaluated in the desired order. For example, the parentheses in the query ("Lasik surgery") AND (astigmatism OR nearsighted) will ensure that the OR operation is performed before the AND operation. This query would find documents with the phrase Lasik surgery, and either astigmatism or nearsighted or both.

HTML Tags

Many search engines also incorporate features to restrict searches to specific parts of a Web page. For example, typing title:hypertension will retrieve only documents that have the word hypertension in their title. Below is a partial listing of other words that can be specified on various search engines.

url: Returns documents containing the specified URL.
image: Detects image files (GIF, JPEG, etc.)
link: Returns documents containing a link to the specified URL.

Find Similar

Exite and Magellan currently support a feature that searches the Web based on an already retrieved document rather than on keywords. Instead of using keywords, the search engine can use a document just viewed as an example in the next search. The new search should then find documents that are very similar to the one previously viewed. In Exite this feature is called "MORE like this link" and in Magellan this feature is called "Find Similar."

CONCLUSION

Search engines have become an essential tool for both the Internet and Intranets. A resourceful Web site is useful if people can find information quickly and efficiently, otherwise it is not. While the Web and search engines will continue to evolve, this study has indicated the importance of Intranet search engines, how to prepare pages for searching, and how to build more powerful searches.



Powered by Sphider