Skip to main content

IRSCLibraries

LIS2004: Web Search Engines

Lesson Three: Web Search Engines

Lesson 3: Web Search Engines

Upon completion of this lesson, the student will:
  1. recognize some of the many sources of free information on the Internet,
  2. understand how search engines work,
  3. understand the basic search features available from most search engines,
  4. explain how meta-search engines operate, and
  5. develop appropriate Boolean search statements for Web search engines.

Introduction: 

As presented in the Course Introduction, the World Wide Web is a subset of the Internet, linking the information world with hypertext. The Web is currently the service that most people use to access Internet resources and services.

Because the Web is not indexed in any standard way, finding relevant information often seems an impossible task. There are several basic types of search tools that may be used to locate web resources: search engines, meta-search engines, metasites, and directories. The following chart details the differences between these search tools and provides examples of when to use each.

Internet Search Tools

Search Engines

Meta-Search Engines

Metasites

Directories

Database generated by computer program Searches multiple databases generated by other search engines Compiled by humans Compiled by humans
Index a large percentage of web resources Search databases compiled by general search engines Coverage limited to specific subject or file format, may index the "deep web" Limited coverage
Use keywords for precise searches Use keywords, but search precision is sacrificed Use keywords for precise searches Allow browsing by subject, often provide a search feature, which searches the directory's limited database
Use for specific, focused searches, narrow topics Use if you have a specific term or want to see a sample of what's available on a topic Use for specific, focused searches on a particular topic Use for general searches, broad topics, when you want the best quality sites

Since the ever-changing nature of the Web provides access to such vast numbers of information resources, web sites and documents appear, are deleted, or are moved to a different location each day. In this dynamic environment, search engines can be the most efficient way of locating information on a specific topic since they provide access to immense, continuously updated databases of Internet resources. There are hundreds of search engines designed to help you find information, whether you are looking for a topic of personal interest, or material for a scholarly research project.

Using search engines effectively may seem intimidating since new search engines appear frequently and existing engines often change their search interface and format. Though there is at present no consistent standard which governs search engines, they do share many basic features which allow the searcher to retrieve relevant information.

This lesson introduces general search engines and meta-search engines, and Lesson 4 covers specialized search engines and subject directories.

Free Resources Available Via the Web

The number and type of resources available through the Internet increases daily. The following types of information are usually free to any Internet user:

  • Current events from newspapers, current issues of magazines, and news wire feeds
  • Corporate information, including annual reports, product information, and stock quotes
  • Government information such as current laws, regulations, court decisions, and information from local, state, and federal government departments and agencies
  • Ready reference material, including dictionaries, some encyclopedias, statistical sources and other quick answer sources, such as:
  • Bibliographic information from library OPACs (Online Public Access Catalogs). Books and other materials located in remote catalogs can often be borrowed from a local library via interlibrary loan.
  • Bibliographic information from various disciplines, including:
    • PubMed, which offers bibliographic references and abstracts to articles from over 4800 biomedical periodicals
  • Texts of books in the public domain (generally books published more than 75 years ago, which are not protected by copyright laws) from sites such as:
    • Project Gutenberg, the oldest producer of free electronic books, currently offering more than 18,000 texts
    • The Camelot Project, which offers public domain literature relating to the Arthurian legends
  • Material on popular culture, such as cinema, television, and sports
  • An increasing number of websites from colleges, universities, and associations, which post information ranging from student research papers to scholarly works by professors and others who are experts in their subject fields
  • Postings to discussion groups, asking or answering specific questions on a particular topic

Articles from some current issues of popular and scholarly journals may be found through searchable databases such as FindArticles.com. In addition, there are many electronic journals freely available via the Web. However, most academic research will require access to journal articles that are only available through library subscription databases. These databases will be discussed in Lesson Five.

How Do Search Engines Work?

Most search engines use a computer program called a "spider" to collect information and index web resources. Sometimes called "webcrawlers" or "robots", these computer programs crawl through websites on the Internet, gathering information from all the pages of a website. The spider returns the information to a central database and then indexes the information it has gathered. When you use a search engine, you are searching the database compiled and indexed by the spider.

While all search engines rely on spiders to collect and index information, each performs its tasks in a slightly different way. Each search engine has its own search interface and uses different criteria for matching searches with documents. Each may also differ in terms of search speed and how it ranks results in order of relevance. Here is an explanation of how Google returns their results. 

Searching would be easier if the search engines used a common standard. However, each search engine operates a little differently, and each search engine database contains a large number of unique documents, with limited overlap. Therefore, it is a good idea to search using more than one search engine to be sure you have retrieved most of the relevant information available on your topic.


Relevancy and Search Terms

A search is performed by submitting keywords in the search box. Then the search engine compiles a list of websites that contain these terms. The order of these sites is often determined by relevancy (i.e., how closely the site matches the query). Search engines look at the location and frequency of occurrence of search terms to help determine relevancy. The higher up on a website that a search term appears, the higher the ranking of that website. A website that contains a search term in the title or in the first few paragraphs of text will be determined to be more relevant than one in which the search term appears toward the end of the document.

Search engines also look at the number of times search terms appear in the text of the website. Sites with a higher frequency of a search term are determined to be more relevant. Google even looks at font size and boldness to help determine relevancy.

Ranking and Popularity

In addition to text-matching techniques, an increasing number of search engines are also using popularity and link analysis as a means of ranking search results.

Ask.com uses a technique called "Subject-Specific-Popularity" to analyze the relationship of sites within a subject community and present relevant results. Sites are ranked based on the number of same-subject pages that reference them.

Google uses PageRank Technology to rank the usefulness of a website. Google interprets a link from website A to website B as a vote by site A for site B. The more votes or links a site receives the more relevant that site is. In addition to looking at the number of links a site receives, Google also analyzes the sites casting the votes. Votes cast by sites which are themselves major sites, such as CNN for example, are weighed more heavily than votes from other less popular sites. Link analysis is comparable to the time-honored tradition of researchers rating the importance of a study or article by the number of times it is cited elsewhere.

Sponsored Links

Most major search engines accept paid listings. Some search engines sell commercial spots on the results list so that the buyer's page is near the top as if it was one of the best results according to a link analysis. In the best search engines, sponsored links or paid listings are clearly labeled, kept separate from search results, but are relevant to the search.  

Size

When search engine producers refer to their size, they are usually counting unique URLs as opposed to unique sites, which may contain a number of URLs. The search engine with the largest collection of sites is not necessarily the best search engine, but potentially, the larger the search engine the greater the chance that you will find something. Many of the major search engines and subject directories use another search engine as a provider to run their search site. For instance, Google is currently the search engine behind AOL Search and Netscape Search. This can be confusing, especially since search engines sometimes change providers. A chart provided by Search Engine Watch shows "Who Powers Whom?"

General Search Features

Most of the major search engines support the following search techniques, although each search engine operates a little differently. To find out which features are supported by a search engine, read the HELP file. There is usually a link to a HELP file near the search box or near the top of the search engine's home page. If it is not in one of these places, try selecting the search engine's Advanced Search option. Often this page will have a HELP file if the basic search screen does not.

Just like other Internet resources, search engines often change their appearance and features with little or no notice.

Bottom Line: If you are not certain which techniques the search engine uses or if your search statement does not work, reread the HELP file.

Case Sensitivity

Some search engines are case sensitive, requiring that proper names and place names be capitalized. In general, when a search statement is entered in all lower case, both lower case and upper case will be retrieved. The reverse is not true. When upper case is used, the search engine will only retrieve the exact match. For example, AIDS will not retrieve the common word, aids.

Boolean Operators

Most search engines support Boolean searching, allowing AND, OR, and NOT searches. Some search engines require that the Boolean operator be capitalized; others do not, although those not requiring capitalization accept it. Therefore, it is a good idea to capitalize any Boolean operator.

Many search engines use a simplified form of Boolean operator, replacing the operator with a symbol:

  • the + sign for an AND search
    Example: +drinking +driving searches for the words drinking AND driving, in no specific order in the text of the web page.
  • the - sign for a NOT search
    Example: +dolphins -football will search for documents which contain the word dolphins but NOT the word football

Google defaults to an AND search (automatically placing an AND between terms), and uses a  - sign to indicate NOT. This means that you do not have to type AND in your Google search statements. However, for explanatory purposes, in this course the AND operator will be included in search examples, and for class exercises you should include this operator in your search statements where applicable.

Search statements combining more than one type of Boolean operator must also use parentheses around synonymous terms. This technique is called nesting. The parentheses tell the search engine to perform that search first. For example, suicide AND (teen OR youth OR adolescent) will search for documents containing any or all of the terms within the parentheses before combining that result with the word suicide.

Phrase Searching and Truncation

Most search engines support the use of quotation marks around words, terms or names you want searched as a phrase, i.e., appearing in exactly the order you enter them. For example, "ozone layer depletion" searches for this exact phrase with the words in the order given.

When devising a phrase search, be sure to evaluate the likelihood of your phrase being used by others. For instance, if you were doing a search on the benefits of reading to children, "reading children" would not return results as well as "reading to children." Phrase searching is the one time you may use minor words like of, in, to, etc.

Some search engines automatically look for singular and plural forms of terms as well as -ing or -ed endings. Others use the asterisk (*) to specify that all endings of the root term be searched. As was discussed in Lesson Two, this technique is called truncation.

Field Searching

Some search engines allow you to limit your search to specified fields, such as the title of the document, a word from the URL, the domain name, the type of file, and the availability of such features as images, sound, and video. In the following table, four types of field searching are demonstrated (title, URL, domain, and file type) in addition to phrase searching and truncation. All of these syntaxes will work in Google except for the truncation symbol (Google now uses stemming technology to automatically truncate for you).

GoalCommon SyntaxExample SearchesSyntax for Examples
To limit search to an exact phrase (i.e. words together in order) "   " You're looking for the phrase health care reform. "health care reform"
To find plurals or variations of a root word (truncation) * You want to find any of the following terms: clones, cloned, cloning, etc. clon*
To specify that your search term should be found in the title of the Web page intitle: You're looking for sites that have tomb raider in their Web page titles. intitle:"tomb raider"
To specify that your search term should be found in the URL of the Web page, including paths and subdirectories inurl: You're looking for sites that have NASA in their urls. inurl:nasa
To limit your results to a particular domain or site site: 1) you only want educational sites (i.e. the domain is .edu).

2) you only want to search within the Library of Congress's website (http://www.loc.gov).
1) site:edu
2) site:loc.gov
To limit results to a particular type of document (i.e., Word document, Excel spreadsheet, PDF, etc.) filetype: you only want Microsoft Word documents filetype:doc

The next table demonstrates how these techniques can be combined to create effective search statements.

Search QuerySearch TechniquesSearch Statement
You want government sites that discuss bioterrorism domain searching bioterrorism site:gov
A friend told you about a great site on elephants that had wildlife in the URL and Africa in the Web page title URL searching, title searching elephants inurl:wildlife intitle:africa
You need an Excel document with statistics on international adoption Boolean operator, phrase searching, file type searching statistics AND "international adoption" filetype:xls
You are looking for sites that relate to children who have ADHD nesting, Boolean operators, phrase searching, truncation (ADHD OR "attention deficit hyperactivity disorder") AND child*

These are just a select sample of search techniques commonly available for search engines. For additional search features, read the HELP file of the search engine you are using.

Advanced Search

Many search engines offer an advanced search mode. In advanced search, you are able to perform many of the search techniques presented in Module 3 by utilizing designated pull-down menus instead of correct syntax to limit your search. Since syntax will vary between search engines, using advanced search often saves time and frustration. However, keep in mind that not all search techniques will be available in advanced search. The following is a screen capture of Google's Advanced Search Screen:

Google Advanced

 

Notice that the form allows you to use Boolean Operators as follows:

AND = all these words
OR = one or more of these words
NOT = any of these unwanted terms

Also, "this exact wording or phrase" is equivalent to using quotation marks to designate a phrase.

For more information on Google's Advanced Search features, check out the Advanced Search Help Guide.

Be sure to check out the advanced search options of your favorite search engines.

3E: Comparison of Major Search Engines

Comparison of Major Search Engines

Some of the most popular search engines are listed below, along with links to their help files:

Major Search Engines

To compare the features of the major search engines, try the following links:

Spotlight on Google

According to Nielsen NetRatings, in August of 2006, Google conducted 49.2% of all online searches in the U.S. (Yahoo was the next busiest search engine with 23.8% of searches). Google's size, uncluttered interface, and fast searching have made it easily the most popular search engine. The following are examples of additional features that make Google stand apart:

  • Google Images provides access to more than billions of images
  • Google News provides access to nearly 10,000 worldwide news sources
  • Google's Language Tools provide translation services for text selections or entire Web pages
  • Google Scholar indexes scholarly information including peer-reviewed articles, theses, books, preprints, abstracts, and technical reports
    Note: Some full-text results retrieved by Google Scholar will only be available for a fee

For the full selection of Google features, click on the more link at the top of the screen:

Google More

3F: Meta-Search Engines

Meta-Search Engines

A special kind of search engine called a meta-search engine allows you to query several search engines at once. Instead of doing a search itself, a meta-search engine sends your request to other search engines, compiles the results, and displays them for you. This process can be much faster than querying several search engines separately.

Meta-search engines do not own a database of web pages--they use and deliver results from the databases and search programs of each of the individual search engines they query. Meta-search engines act as an intelligent middle-man to pass your search through, gather the responses and then give you a report from several engines at once. As well as saving time, this kind of search engine can give you an overview of the kind of document you may find using your search terms and may even result in giving you exactly what you need if you are searching for a unique term or phrase.

There are some disadvantages in relying exclusively on meta-search engines. None of the meta-search engines query all of the largest search engines. If a search connection takes too long, one or more of the search engines may time out and produce no results. If you submit a complicated search to a meta-search engine that one of the queried tools does not "understand", you may get no hits at all from that engine. However, you will usually get results from another tool that supports your search strategy.

Meta-search engines retrieve only the first 10-50 hits from each search engine; the total number of hits may be less than you would retrieve with a direct search on a single search engine. Thus, meta-search engines do not eliminate the need to learn how to intelligently search at least one or more general web search engines.

Each meta-search engine has its own interface and method for letting you choose engines to search, so it is important to consult the "Help" files for each meta-search engine.

Links to Major Meta-Search Engines

Some of the most popular meta-search engines are listed below, along with links to their help files:

Meta-Search Engines
Dogpile Dogpile Help
iBoogie iBoogie Technology
SurfWax
SurfWax Technology
Clusty
 

A Meta-Search Engines chart, comparing search features and results, is available from the University of California Berkeley Library.

Sample Searches in Google, Dogpile, and Clusty

Google's and Clusty's Basic Searches and Dogpile's Advanced Search were used to search for information on the following topic, first introduced in Lesson Two:

What are the consequences of binge drinking by college students?

Search #1

Google
college AND students AND "binge drinking"
Results: 535,000
Dogpile
college AND students AND 'binge drinking'
Results:
79
Clusty
college AND students AND 'binge drinking'
Results:
103,700

Discussion: Many of the websites in the top 20 of all three search engines were the same, although they also had unique sites in the top 20. Top listed sites appear to be major sites on this topic. The search engines' results provided brief annotations of the sites. The size of Google's database is very evident in this search.

Google included links to three scholarly articles at the top of its list. Dogpile offered eight suggestions in its "Are you looking for?" box: "Binge Drinking on College Campuses" and "Effects of Binge Drinking." Clusty has a subheading clustering of the results. Each of these links leads to additional relevant sites.

For each Dogpile and Clusty site listed, there is a message letting you know from which major search engines these results were obtained. Dogpile compiles from Google, Yahoo, MSN, and Ask.com. Clusty compiles from YahooNews, MSN, Ask.com, and other sources.

Search #2

Google
"binge drinking" consequences site:edu
Results:
34,800
Dogpile
'binge drinking'
consequences
Include results from .edu
Results:
74
Clusty
"binge drinking" consequences site:edu
Results:
4,160

Discussion: The addition of the search term "consequences" and the educational domain limit made this a more specific search than Search #1. Prevalent among results from the search engines were studies and survey reports. For each search engine, the new top 20 sites were different from the top 20 sites retrieved in the first search. In addition, there were only three top 10 sites that were retrieved by Google, Dogpile and Clusty in this revised search. Three sites were in two of the results lists, but not in the third.

Google, Dogpile and Clusty retrieved PowerPoint and PDF documents in addition to websites. This time Google did not include links to scholarly articles at the top of the page.

Dogpile had 2 unique sites, Clusty and Google both had 6 unique sites.

Search #3

Google
"binge drinking" AND intitle:"college students"
Results:
826
Dogpile
'binge drinking AND intitle:'college students'
Results:
77
Clusty
"binge drinking" AND title:"college students"
Results:
149

Discussion: This search provided many unique websites. Many of the top ranked sites in the three searches were the same as the top ranked sites in the first Google search.

Dogpile retrieved many Web pages that appear to be ads for college students – even an ad for an alcoholic beverage! Using field limiting syntax in Dogpile may produce many useless results. All three had only one title in common in the top 10 results.

Search #4

Google
"binge drinking" site:org
Results: 239,000 Dogpile
"binge drinking" Include results from .org
Results:
51
Clusty
"binge drinking" host:".org"
Results:
55,200
Discussion: This search retrieved various websites from health advocacy organizations, as well as PBS sites and the like. Many of the top 20 sites in all three searches were the same although each search engine produced unique results as well. There were ads for commercial sites in Dogpile. This illustrates that search limits only apply to the primary results, not the sponsored ones.

Which Search is the Best?

It would be difficult to state categorically that one of the previous searches was best. However, you can surmise the following from the searches in Google, Dogpile and Clusty:

  • You should search for a topic in more than one search engine or type of search engine. Though there were many of the same web sites retrieved by each search, some of the results from the search engines were still unique to that particular search.
  • Do more than one or two searches on a topic within a search engine. Try alternative terms and synonyms to retrieve different results. Each of the searches retrieved unique web sites. By doing separate domain searches, you can retrieve the most relevant web sites from that domain without having to wade through web sites from other domains.
  • Try combining search terms in different ways. Leave out one concept to try and enlarge search results.
  • If you want specific information, be as specific as possible with your search terms. In search #2, the search term "consequences" was included, resulting in perhaps the most relevant searches for the topic.
  • If available as a search option, use field searching to limit search results to web sites with important terms in the title and to limit to educational, governmental, and organizational sites. You may have to use an advanced version of the search engine to apply field searching.
  • And finally, do NOT rely solely on search engines. As you will discover in Lessons Four and Five, some of the best, most relevant and useful electronic information is not available through web search engines!

Licensed under the Creative Commons Attribution Share Alike 3.0 License

Copyright © 1997-2009 Florida Community College, Learning Resources Standing Committee. Last revised May 2009 by the LIS 2004 Course Revision Committee.