Piped

Piped is a framework for developers, so this page is a bit technical. See our services for how we can help you improve your search!

Piped, which we have built Found on, makes it super-easy to get data from somewhere, slice it, dice it, and send the result somewhere. It grew out of necessity. We quickly realised that we were solving the same data-integration problems over and over.

With Piped, business processes are described almost literally as flow-charts. The process of searching, for example (and quite simplified), is 1) Pre-processing and validating, 2) Searching, 3) Post-processing. Searching in a people directory and a huge document index, while conceptually quite similar, will clearly differ in some of the steps. The people search will probably have some pre-processing that enables searching for words that sound like the user’s input. Searching the large index, however, will not need that kind of rewriting — but since it is a lot larger, it might need to distribute the search over many machines.

Plug Together a Processing Graph

Piped is all about wiring together plugins in a graph-structure — a pipeline — that makes the plugins work together to accomplish something useful. A lot of functionality is provided by Piped, and by Found if you are solving search problems.

The figure shows a simple example of a “pipeline”. A pipeline is a configured combination of processors. The nodes A–D are called “processors”. Processors are plugins, i.e. simple Python-classes that implement a certain interface, and give themselves a name — preferably something more descriptive than “A”.

The processors provided by Piped and Found are usually quite general and configurable. However, it is really simple to insert your own code at any point in the processing graphs. The entire point is to make it easily customizable — and the difference between a battle-tested best-practice and what you need to solve a specific problem is often just a few custom lines here and there!

The processing graphs are configured with succinct YAML-files — easily commentable and version-controllable. We do auto-generate diagrams to get a visual of how the data flows, though!

Flexible and Scalable

With a plugin– and pipeline-system that makes it very easy to tailor processing pipelines, here’s how you can put it into production and process massive amounts of data.

Consider the process of indexing documents. If the documents are PDFs, Word-documents, etc., they need to be converted to the text and metadata in the format subsequent processing expects. Then, the text processing step takes the text and metadata, does some text-processing and passes indexable data to the indexing step.

A developer can code and test these processing pipelines on her laptop as a sequential process. However, the document-filtering and text-processing steps are both trivially parallelizable. By inserting some message queues between the steps that can be done in parallel, it’s simple to scale up the processing capacity. Any number of computers (and cores) can be doing filtering and text-processing, and the results are all funneled to the last indexing-processor. (Of course, there could be several of those as well!)

By expressing your business process as a pipeline in Piped, it’s that easy to split it up, unleash it on your cluster (or favorite cloud-provider) and process tons of data.

Well-Connected

As the previous section illustrated, it is easy to get Piped to communicate through message queues.

There are several other ways to put things into Piped, though — or get the results out.

Piped leverages Twisted’s excellent libraries, and the developer preview will contain these means of communicating:

  • Web: Twisted’s battle-tested HTTP-server makes it trivial to communicate with Piped through HTTP. Easily connect your pipelines to handlers — and an included interactive debugger eases debugging them.
  • Message Queues. ZeroMQ–support is implemented. Supporting other message queues like AMQP is not a lot of work.
  • Processes. Communicating with other processes is easy.
  • SSH: You can SSH into the running process and get an interactive Python-console. Inspect, tweak, debug, hack.
  • Perspective Broker: Twisted’s convenient RPC-stack.

Since Piped is built on Twisted, it is easy to add support for more network protocols.