Searching The World Wide Web
Yes, the World Wide Web does contain everything you will ever need to know about anything you are ever likely to be interested in. But how come it is always so difficult to find what you want? Searching for that little nugget of information that you know is out there somewhere can be a frustrating business. What you need are a few techniques to speed up the process and help you hone those information retrieval skills.
But before we start - what we need is a wee bit of background information.
Question: What are my options for searching the web?
Answer: In broad terms, you have two options:
- Search Engines
- Directory Listings
Briefly, lets have a look at what they are and how they work.
Search Engines
Search Engines are created by the use of a computer program called a crawler (sometimes referred to as a spider or robot, or bot), which constantly roams the Internet, visiting and cataloguing the web pages that it finds.
The result of all this constant ’roaming around’ is a searchable catalogue with ’lots and lot’s of web pages in it, which is a good thing because the information you are searching for is likely to be on one of these pages1. The down-side is that you might have to sift through a lot of junk to find it.
Major search engines include Alta Vista , Excite, HotBot, Infoseek , Lycos and, more recently, Google. (Links to these site can be found at the bottom of this page.)
Directories
Directories are not really search ‘engines’ but categorised lists of sites with information organised by subject. In most cases this has been compiled by real human beings (i.e. impossibly young folk sitting in swivel chairs in front of computers drinking cans of a popular fizzy drink).
Web sites get listed (or not) on the site when people submit them via a web based request form. Each site is assessed, and accepted or rejected, on the basis of ‘who knows what?’ More and more directories are moving towards a system where you only have a chance of being included if you pay a fee - so now largely we do know what.
The most famous and widely used Directory site is Yahoo. When you visit Yahoo you are presented with a list of general topic headings. From this page You choose the topic that best describes the information you are after, your are then presented with a list of sub-categories, again you choose the most appropriate category - and so on down the hierarchy untill you find what you are after.
Other directory based sites include LookSmart , Infoseek and Excite. Both Infoseek and Excite also offer a search engine. ( Again links to directories can be found at the bottom of this page.)
Advantages and Disadvantages of Directories
Directory based sites tend to be much smaller than Search Engines, but they are likely to contain a more ‘select’ set of sites and can be useful as a starting point for a search. I would recommend that you start your search with a directory site like Yahoo, which will help you find sets of sites related to the topic you are interested in.
Search Tips
That’s the background stuff finished. It’s time to find out the techniques that will help you find the information you are after. Most of these tips are relevant to search engines rather than directory sites. To a large extent directory based sites are self-explanatory; in a later section I will cover what some of the major directory sites have to offer - maybe.
Windows of 0pportunity
An article by the NEC Research Unit (June 1998) estimated that "at the lower bound" there are currently 320 million indexable pages on the WWW. It also reported that the major search engines only index a fraction of the total number of these documents - “No engine indexes more than about one third of the publicly indexable Web” 2
One of the conclusions of the NEC research Unit was that combining the results of multiple engines can significantly increase coverage. Combining six engines covers approximately 3.5 times as much of the Web as one engine on average.
With that in mind here is my first tip: When I am searching for something I want quick results. Once I have typed in my search term I don’t want to wait while it goes away, queries its database and sends back a page to me.
To speed up the search I open up another browser window, jump to another search engine and repeat my query . With the second search responding to my request I repeat the process - opening another search engine - and so on. I may have five or six windows, with as many different search engines, on the go at once. Some are quicker to respond than others and I can switch between them, using the window menu of my browser, to check the results as they come up.
The 'opening more than one window at a time' trick is a good habit to get into, even when you are not using search engines. Even if the individual pages don’t load any faster than before - getting more windows open and working for you allows you to be more productive.
It is also useful to try to get to know your search engines and and note how they perform in relation to your particular area of interest. Search engines don't all index the same parts of the Web - some specialise in indexing particular topic areas. To increase your 'hit rate' it pays to find the 'pack' of search engines that consistently perform effectively for you.
Quicker Please
Another technique that makes all this faster can be found in the latest version of Netscape’s Browser. There is no need to remember the full Web site addresses of the most popular web sites. Just type in the name of your chosen search engine or directory in the browsers location box e.g. type in lycos, or excite or yahoo and hit the return key- the browser will construct the rest of the address for you.
This will also work In Microsoft Explorer, however, it operates slightly differently: the name you type in is entered as a query into Microsoft's search engine, which returns what it thinks is the most appropriate address. Explorer remembers if you have been to a particular Web site before and will complete what it thinks is the site you are typing in, as you type it. The latest version also gives you a list of sites to choose from, which correspond to the address you are typing.
Write It Right: Power of the Plus
What you type in to the search boxes of the various search engines is probably the most significant factor in the success or failure of your search. A few skills need to be developed here - and the first one is an appreciation of ‘the power of the plus sign’.
Here is an example. I want to find information about disability organisations in Scotland. Assuming I have no knowledge of the “power of the plus sign”, I would jump to my favourite search engine and type the words “disability Scotland”. But, what I will probably find is not all the pages containing both these works but all pages that contain either the word disability or the word Scotland.
So I just might have to wade through all that touristy stuff about the bonnie banks, of the whatever, before finding what it is I am after. The search engine has assumed that I want to find documentation relating to disability ‘or” Scotland when in fact what I mean is: “give me pages that contain both disability ‘and’ Scotland“.
In order to make your search request explicit put a plus sign in front of the words:
+disability +Scotland
and now your search should only return pages which contain both of these words.
Don’t leave a space between the plus sign and the word, search engines vary in the way they implement these techniques - so experiment. For example you could try leaving out the plus on the first word and getting rid of any spaces between the words and the plus sign:
disability+Scotland
Another little refinement to this technique is the use of the minus sign ‘-’ to tell the search engine “I don’t want this word to appear in the page”:
+disability +Scotland -health
This gives a more specific message: “I want disability and Scotland but I don’t want any web pages that mention the word health“.
Many of the large search engines are becoming a bit smarter these days and are starting to assume that you mean ‘and’ rather than ‘or’, but it can still pay to be explicit.
(What you are doing when using the above technique is constructing what are called ‘Boolean phrases’. A handy term to impress your friends with.)
What’s that Phrase?
One of the most powerful techniques you can use to speed up your searching is to search for an exact phrase.
For many search engines this is done by putting the phrase in quotes, e.g “Pat's Guide to the West End”, if typed into the search field of InfoSeek, would return only pages that contained this exact phrase. Altavista and WebCrawler also work in this way.
Others allow you to choose the option from a menu (Hotbot) or by clicking on a radio button (Lycos). I recommend you try this technique out the next time you’re struggling to cut down those three million ’hits’ that you keep getting.
Using Lower and Uppercase characters
When searching for names e.g.places, people, organisations, you should always capitalise the first letter, e.g ‘Partick Thistle Football Club’
If you are not searching for names or proper nouns use lowercase text in your searches. The resulting search is likely to find both upper and lowercase results.
Example: When you search for glasgow, you'll find glasgow, Glasgow, and GLASGOW in your result pages. However, when you search for Glasgow, you'll only see Glasgow in the result pages.
Other useful search options offered by most search engines include: searching for a page title; searching for names and searching for people.
There is no consistent method across the search engines to signify how you go about using the techniques which I have outlined, which is a bit of a pain. To help you along I have provided guides to some of the major search engines on another page in this section ( whether I have published that bit on the Web yet is another matter - but you could have a look).
Think laterally
Another useful tip to pin-down the pages you are searching for requires a bit of ‘brainstorming’ on your part. Most people when searching on the web just just type in the first couple of words that spring to mind, hoping that this will be enough to bring back the ‘goodies’.
In circumstances where your usual techniques are not working try this approach.Think up as many words and phrases as you can that are related to the topic you are searching for. Think laterally, have a look at the thesaurus in your word processor (or even the one sitting on your shelf) if you are stuck . Use your results as ’ammunition’ for your next assault on the search engines.
Meta-searchers - simultaneous searching of multiple databases.
It is now possible to search multiple search engines simultaneously using tools like Sherlock for the Mac and WebFerret for the PC. Even if you do not have these tools on your computer there are Web sites that can offer you this facility: Dogpile (www.dogpile.com), MetaCrawler (www.go2net.com/search.html), SavvySearch ( www.savvysearch.com).
It is well worth checking these out - I mostly use a Mac and frequently do my searching through Sherlock. Each search engine can be added to Sherlock by adding a plug-in for that particular search engine. Tools such as Sherlock, combined with some of the techniques outlined above, can speed up your searching considerably.
Get help
One final powerful tip; learn how the search engines you regularly use work. Read the help files, try the advanced search options, experiment with the different options offered. Invest some time up front and you will reap the rewards in the future.
As a general rule it pays to hit that button on search engines which says ‘More Options’ or ‘Super Search’ or ‘Advanced Search’- it can give you extra tools to refine your search and save you a lot of time in the long run.
What you can't find using traditional search engines.
I read a very good article recently by Ken Wiseman3 called 'The Invisible Web'. Ken estimated that up to 50% of the Web cannot currently be indexed by search engines. So no matter what search strategies we adopt half of the Web is currently invisible to us. The problem is caused by the move away from storing Web pages as files on a hard disk towards information being stored in databases.
Search engines cannot index the information within these database. We are talking here about a significant amount of information; currently there are over 1,700 information rich databases connected to the Web. These database can, of course, be queried directly - with the result page being constructed 'on-the-fly' and sent back to the browser in response to the query. However, with over 1,700 databases visiting them all individually could take you quite a bit of time!
Ken Wiseman does offer some hope in relation to making these databases a bit more visible. As I mentioned earlier Sherlock can make any on-line database searchable due to it's Plug-in architecture - all that is needed is a plug-in for the appropriate databases. In addition many of the major search engines are starting to address the 'invisible web' problem by pointing to appropriate databases along with search results.
Footnotes
1. Search engines don't actually hold Web pages but instead index of words - which point to appropriate Web pages.
2. By February 1999 the fraction of the web indexed by the major search engines was even lower; the search engine with the largest coverage was Northern Light, which covered less than a fifth of the Web. By September 2000 the number of Web pages was closer to the 1 billion mark and the Web was growing faster than any search engine could index it. The percentage of the Web that any particular search engine is capable of indexing is decreasing rapidly.
3. (the District Technology Coordinator for schools in Illinois)
Some useful Search Engines
AltaVista: http://www.altavista.com
Excite: http://www.excite.com
Hotbot: http://www.hotbot.com
Infoseek: http://www.infoseek.com
Lycos: http://www.lycos.com
Google: http://www.google.com
Directories
Yahoo: http://www.yahoo.com
Looksmart: http://www.looksmart.com
Infoseek: http://www.infoseek.com





