Found
Found is our search-platform. It builds on Piped, so if you are interested in how it works, read the description of Piped as well.
This page briefly explains some of the capabilities of Found, and the sister-projects Found:Connectors and Found:UI. See what problems we solve to see why these features matter, and our services for how we can help you use them to rapidly and greatly improve your search.
- Content extraction is getting the content out of various
information systems, databases, etc. and into the indexing
pipeline. Whether it is crawling a web-page, acting on database
triggers or consuming a continuous content feed, the data must be
extracted and converted into an interchangeable format for further
processing.
The Found:Connectors-project is built on Piped’s excellent connectivity capabilities. The preview release will come with connectors for eZ Publish and WordPress, and examples of how to use the Scrapy-crawler to scrape pages.
- Text analysis is the process of making sense out of natural
language text. From guessing on which language the document is in,
to determining mentioned people/organizations and splitting up the
text into indexable terms, it is an important part of any search
system.
The first release will support the excellent NLTK. Also, we are including special support for tokenizing Norwegian terms.
- Ranking is what the search system does to consider what is
“relevant”. Google uses hundreds of variables to influence the
ranking and have to constantly fight spammers that are trying to
game the system.
While you probably will not need something that complex, it is important to be able to easily influence the ranking. For example, the page/product’s placement in your document/product-hierarchy, whether the term is in a title or a comment are some examples of influences that are easily accomplished with Found.
- Indexing is the process of combining all of the above and
making data-structures that allow efficient searches.
With Found, it is easy to route data into different (or several) indexes. It can be completely different kinds of indexes as well — to Found it is just another place to put the data.
- Searching is, of course, the most important part of the search
system. The user inputs a search query, which is parsed,
pre-processed, rewritten, compiled, distributed, evaluated, merged,
post-processed, before the results are returned to the user.
Found makes it easy to hook into any part of that process — and equally easy to ignore the intricacies of what you do not need to customize. For example, if you are implementing people search, you may want to transform the query so names that sound like the user’s input are found.
- Logging is important to know what your users are searching for, what results they are clicking on, what features of the user interface they are using, etc.
- Found helps analyzing those logs by structuring them and
providing reports that show you the most common searches, whether
the users find what they are looking for, changing trends in
interest, indicators of a confusing information architecture, etc.
This information is important to help you influence the ranking functions mentioned above — possibly even automatically.
- The user interface is also key. If the user interface is not
intuitive and easy to use, it does not matter how powerful your
search is.
The Found:UI–project is all about providing common search patterns as readily available components — that know how to interact with the Found search server.