Author, reviewer and revision dates:

Created by John C. Thomas on 6th of September, 2001

Reviewed by <> on <>
Revised by <> on <>

Synonyms

Multiple Criterion Search, Fuzzy Search on Several Dimensions

Abstract:
People should be able to find what they are looking for in a large sea of data by approximately specifying a number of dimensions simultaneously rather than by being completely accurate on any one dimension.

Problem:
The way human memory works, it is often the case that we recall approximate information about an item on a number of dimensions rather than recalling with complete accuracy information on any one dimension. In many cases, it is not that people don't have access to the information that they need; the problem is in finding the information. It can be very time-consuming, inefficient, and frustrating to find a specific document or fact in a large collection using current search techniques. A facility should be available that more closely matches the way human memory works. This facility should be invokable by a number of different applications.

Context:
It is often the case that over time, people forget the file name that they give to a document, presentation, or program. Nonetheless, they generally remember something about the document such as the approximate date, title, size and so on. Current search facilities do not allow the user to use the information that they do retain to search for the document in a very efficient way. It may well be the case that only one or a small set of documents would be match all the approximate specifications that could be given. Unfortunately, that is not currently an option and the person must search linearly through a long list hoping to recognize the name. Many people find occasion to use web browser search engines. In some cases, they are trying to find a specific item and again, they may well know something about multiple attributes of what they are looking for without being able to specify any term exactly. In this case, it is not only a problem of forgetting. They may simply be using a different term from the one that the author of the document or web page chose.

Forces:
People spontaneously use different terms to refer to the same item.
People often recall approximate information on many dimensions although they may not recall specific information on any one dimension.
People sometimes need to find one particular document, person, or other resource.
Computers are good at fast linear searches.
People are slow at fast linear searches.
People will often give up on a search even if the correct result is returned if it is hidden in a large set of returned items.
People's time is costly compared with computer time.
The amount of information a person may want or need to search through grows over time.

Solution:
Allow people to specify as much as and no more than they know. In many cases, this will mean that people need to be able to supply approximate information on multiple dimensions. If the underlying data structures are rich, then search terms should be applied to the appropriate dimensions. For example, if documents have an associated attribute value structure that specifies Title, Author, Date, and Size, then people should be asked to specify as much as they can about each dimension separately and these terms can be applied in an appropriately fuzzy or expanded manner against the separate dimension and the results collated together. Indices can be expanded by using additional knowledge sources such as a hierarchical ontology, a thesaurus, an ordered list, an organization chart, a social network analysis and so on.

If, on the other hand, the data is less richly structured, search terms can be combined across a single index expanded in the same way. In either case, the result is a search algorithm that produces greater recall at the expense of less precision. Lower precision can be handled in multiple ways. For example, a broader search may only be applied if a narrow and specific search gives no result. Or, the results of a broad and narrow search may be returned with narrow results shown first. Alternatively, weighting factors can be applied so that precise matches count more heavily than imprecise matches.

Rationale:
Human memory is associative and evolved over millions of years to deal with physical reality. In physical reality, as you approach something more closely, more detail is revealed to the senses. This helps you see, hear, feel or smell what you are looking for. In addition, the added detail serves as further reminder to any relevant memories of previous experiences. It must also be noted that some things, e.g., the location of plants and animals changes somewhat over time.

Suppose that one summer, a person is foraging and comes across a blackberry patch. Twenty years later, he comes back to the same area and is reminded of the blackberries. He recalls approximately where they are and as he nears, he sees the familiar pattern of blackberry bushes. He approaches more closely still and sees the blackberries. By contrast, if one were to replay this scene in the typical world of computer architecture, the interaction would go more like this: "Ah, I found some blackberries. Hmmm. I wonder how I can find them again. Computer, help me remember where these blackberies are." Computer: "Certainly. These blackberries are at Latitude North 40 degrees, 47 minutes, 22.122 seconds and Longitude West 73 degrees, 58 minutes, 12.718 seconds. If you ever need to find them again, simply type in these coordinates." This is an artificial and difficult way for people to remember how to find things. It is, of course, a simple table lookup for the computer but the cost of people's time in dealing with such interfaces is now far greater than the cost of the processing power that it takes to provide a more natural search facility.

Related Patterns:
Parallel Processing Support for People Render Unto Ceasar

Known Uses:
Human agents such as travel agents, real estate agents, or librarians are good at this sort of thing. They listen to multiple constraints of their clients and then find items that match or nearly match on multiple criteria.

References:
<< to be developed >>

Back to Pattern Language

Back to Welcome Page