Problems We Solve

If you don’t find it in the index, look very carefully through the entire catalog.”
— Sears, Roebuck, and Co., Consumers’ Guide, 1897

Luckily, things have changed a lot since 1897. We assume you know why search is an important part of today’s information systems, so we’re not going to waste your time discussing the obvious — how many hours a week your employees can save, that a customer will not buy a product it cannot find, etc.

The diagram shows a simple overview of some important processes in a search system. Following is an explanation of why they are important, and how they tie together. See the description of Found for an overview of how we approach the problems.

When the search server receives a query, a number of things happen. We will get back to those, but ultimately the query results in looking up some kind of indexes — data structures that are designed to answer certain kinds of questions really fast. Usually, the query is made of up search terms — but they can also be locations, points in time, colors, value ranges or even the contents of images.

These indexes are made based on the content of various data sources. Indexing is the process of extracting the text and metadata necessary to answer the queries of the users. Combined with rules devised by an information expert, the documents are given various scores that are used in the ranking–process. Ranking is essential to ensure the users are receiving relevant results.

To get an idea of what the users are looking for, user behaviour such as search input and clicks are logged, so they can later be analyzed. This can tell us whether the users are finding what they are looking for, or if the ranking rules need to be enhanced. Notice how there is a loop in the diagram — the actions of the users influence the ranking, which in turn affects indexing, eventually changing the results the users see.

Text processing does natural language analysis. For example, words can be reduced to their base form, compound words can be split up, spelling errors fixed, synonyms expanded, and so on. For this to work, compatible analysis must have happened when the documents have been indexed.

The searcher does his searching through a user interface, obviously. The user interface communicates with a server that performs the searches. Users will typically expect the results to be returned in less than a second, so these components must be well-tuned and respond fast — also in the face of lots of concurrent users.

If your users are experts using the same search in their daily work, they will demand advanced searching options. If they are customers looking to buy a cheap blue shirt that is in stock, they are not going to be spending time learning an interface. Regardless, the user interface must be intuitive, while promoting exploring the data.