Search is a very important area of study in computer science. Just think of how often you search for information on the Internet using Google or some other search engine. It's remarkable how much information Google's algorithms search through and how fast they deliver the results.This unit will start to explore different methods of conducting searches.
- Students will understand how binary and sequential searches function and will examine the efficiency of each method
Watch the video, entitled "How Google Search Works, in the assignments box to the right. As you can see in the video when you do a Google search, you aren't actually searching the Web, you're searching Google's index of the Web. Google's spider programs are constantly traversing the web, collecting millions of web pages and organizing them into an index. When you do a Web search Google's algorithms are searching that index.
What's the best algorithm for searching an index? An index is an ordered collection. Think of the index that comes at the back of a textbook. It is organized in alphabetical order. Each entry in the index refers to some page in the book.
To help you think about the problem of searching an index we're going to play a guessing game. The objective of the game is to come up with the most efficient algorithm for guessing a number between 1 and 100, where most efficient means that it takes the fewest number of guesses.
The following widgets are courtesy of Mobile CSP, Trinity College and College of St. Scholastica.
One way to look at this game is that we are searching for a number in a list of numbers. Our search made use of the fact that numbers are ordered. The feedback we received – "too high" or "too low" – was based on that order. If you're still working on figuring out an efficient algorithm, maybe the following widget will give you some ideas as the computer tries to guess your number. Try to observe the algorithm that the widget is using.
There is a very efficient algorithm for the guessing game problem when the order is known, known as the binary search algorithm.
The pseudocode for a binary search is as follows:
- Repeat until your guess is correct or until you run out of numbers in the list.
- Guess that the target number is the middle number in the list.
- If the guess is too high, ----Cut off the top half of the list.
- If the guess is too low, ----Cut off the bottom half of the list.
- If the guess is correct, ----Stop.
- If you ran out of numbers, the target number is not in the list.
What if you had to search a set of data that was not sorted? Binary search won't work in that case. To illustrate this problem, let's try a variation of our guessing game. This time the app will only tell you if your guess is right or wrong, not whether it is too high or too low. Try it.
As you can see from this game, if you don't know the order of the items you are going to search, you have no choice but to search them sequentially if you definitely want to find the secret number
Suppose that I was looking for a letter in any of 26 boxes. The following Pseudocode would apply:
The pseudocode for a sequential search to find the letter "F" is as follows:
- Let x represent the box number to search, initially 1.
- Repeat until you find 'F' or run out of boxes to search
- Look in box 1,
- If "F" is in box x, stop and report x's value,
- Else, Add 1 to box x
- If you don't find the letter "F" in any box, report "not found."
So in this algorithm we are letting x keep track of what box we are searching. It starts at 1 and increases by 1 so that we will look at every box until we find 'F' or run out of boxes. If we find 'F' we report what box it was in by reporting x's value. If we don't find it, we report that it wasn't found.
Searching for 'F' in this set of boxes represents our worst case scenario because our algorithm would have to look in every box to conclude that 'F' was not in the boxes.
Make sure that you do the Self-check at this link. Also, make sure that you do the Google Site for 5.3