Found

Found is our search-platform. It builds on Piped, so if you are interested in how it works, read the description of Piped as well.

This page briefly explains some of the capabilities of Found, and the sister-projects Found:Connectors and Found:UI. See what problems we solve to see why these features matter, and our services for how we can help you use them to rapidly and greatly improve your search.

  • Content extraction is getting the content out of various information systems, databases, etc. and into the indexing pipeline. Whether it is crawling a web-page, acting on database triggers or consuming a continuous content feed, the data must be extracted and converted into an interchangeable format for further processing.

    The Found:Connectors-project is built on Piped’s excellent connectivity capabilities. The preview release will come with connectors for eZ Publish and WordPress, and examples of how to use the Scrapy-crawler to scrape pages.

  • Text analysis is the process of making sense out of natural language text. From guessing on which language the document is in, to determining mentioned people/organizations and splitting up the text into indexable terms, it is an important part of any search system.

    The first release will support the excellent NLTK. Also, we are including special support for tokenizing Norwegian terms.

  • Ranking is what the search system does to consider what is “relevant”. Google uses hundreds of variables to influence the ranking and have to constantly fight spammers that are trying to game the system.

    While you probably will not need something that complex, it is important to be able to easily influence the ranking. For example, the page/product’s placement in your document/product-hierarchy, whether the term is in a title or a comment are some examples of influences that are easily accomplished with Found.

  • Indexing is the process of combining all of the above and making data-structures that allow efficient searches.

    With Found, it is easy to route data into different (or several) indexes. It can be completely different kinds of indexes as well — to Found it is just another place to put the data.

  • Searching is, of course, the most important part of the search system. The user inputs a search query, which is parsed, pre-processed, rewritten, compiled, distributed, evaluated, merged, post-processed, before the results are returned to the user.

    Found makes it easy to hook into any part of that process — and equally easy to ignore the intricacies of what you do not need to customize. For example, if you are implementing people search, you may want to transform the query so names that sound like the user’s input are found.

  • Logging is important to know what your users are searching for, what results they are clicking on, what features of the user interface they are using, etc.
  • Found helps analyzing those logs by structuring them and providing reports that show you the most common searches, whether the users find what they are looking for, changing trends in interest, indicators of a confusing information architecture, etc.

    This information is important to help you influence the ranking functions mentioned above — possibly even automatically.

  • The user interface is also key. If the user interface is not intuitive and easy to use, it does not matter how powerful your search is.

    The Found:UI–project is all about providing common search patterns as readily available components — that know how to interact with the Found search server.