This is part one of a two post blog series, aiming to demonstrate how to feed logs from IIS into Elasticsearch and Kibana via Logstash, using the hosted services provided by Found by Elastic. This post will deal with setting up the basic functionality and securing connections. Part 2 will show how to configure Logstash to read from IIS log files, and how to use Kibana 4 to visualize web traffic.
Learn and play with Elasticsearch
ArticlesGet the RSS
Elasticsearch is used for a lot of different use cases: "classical" full text search, analytics store, auto completer, spell checker, alerting engine, and as a general purpose document store. This article gives a brief overview of different common uses and important things to consider, with pointers to where you can learn more about them.
We all know memory is critical to Elasticsearch, but when should you add more? In the Found console we've included a memory pressure indicator so that you can easily check where your cluster is at in terms of memory usage and capacity. For those familiar with how the JVM garbage collector works: "The indicator uses the fill percentage of the old generation pool". In this article I will cover the basics of the old pool and why we chose that as the indicator.
When starting using Elasticsearch, it's easy to get confused about all the different ways to connect to Elasticsearch and why one of them should be preferred over the other. In this article we'll provide an overview of the different client types available and give some pointers on when one should be chosen over another.
Full-text search is an interesting field for backend developers: it requires raw technical knowledge, knowledge about the input languages and lots of feedback gathering on the usability side. In this article, I'd like to give you a rough overview of the challenges in each area and inform you about basic solutions for them.
As much as we love Elasticsearch, at Found we've seen customers crash their clusters in numerous ways. Mostly due to simple misunderstandings and usually the fixes are fairly straightforward. In our quest to enlighten new adopters and entertain the experienced users of Elasticsearch, we'd like to share this list of pitfalls.
Scripting is an important part of the toolbox of any Elasticsearch user, and enables evaluating custom expressions that may be used to synthesize fields or provide customized scoring. In this post we take a brief look at the upcoming changes to the scripting module and the different scripting languages available to use today.
Let's explore Apache ZooKeeper, a distributed coordination service for distributed systems. Needless to say, there are plenty of use cases! At Found, for example, we use ZooKeeper extensively for discovery, resource allocation, leader election and high priority notifications. In this article, we'll introduce you to this King of Coordination and look closely at how we use ZooKeeper at Found.
LIRE (Lucene Image REtrieval) is a plugin for Lucene to index and search images. A cool and quirky feature that sets it apart is that it does content based retrieval, a fancy word for saying that you use images in your search query and it retrieves similar images from the index. In order to use LIRE with Elasticsearch, we need to make Elasticsearch aware of the new data type and the query that is provided by LIRE. Luckily there is a plugin for Elasticsearch that does just that.
One of the important aspects of Elasticsearch is that it is programming language independent. All of the APIs for indexing, searching and monitoring can be accessed using HTTP and JSON so it can be integrated in any language that has those capabilities. Nevertheless Java, the language Elasticsearch and Lucene are implemented in, is very dominant. In this post I would like to show you some of the options for integrating Elasticsearch with a Java application.
Get started analyzing your logs on OpenShift. While OpenShift lets you tail the logs of your apps, the Elasticsearch/Logstash/Kibana trinity gives you a very flexible and powerful toolchain to visualize and analyze these logs. This article explains how to make a Logstash cartridge on OpenShift. The cartridge feeds your logs into Elasticsearch where you can use the Kibana visualization engine to follow trends, detect anomalies and inspect incidents in your environment.
We're often asked "How big a cluster do I need?", and it's usually hard to be more specific than "Well, it depends!". There are so many variables, where knowledge about your application's specific workload and your performance expectations are just as important as the number of documents and their average size. In this article we won't offer a specific answer or a formula, instead we will equip you with a set of questions you'll want to ask yourself, and some tips on finding their answers.
Elasticsearch's Fuzzy query is a powerful tool for a multitude of situations. Username searches, misspellings, and other funky problems can oftentimes be solved with this unconventional query. In this article we clarify the sometimes confusing options for fuzzy searches, as well as dive into the internals of Lucene's FuzzyQuery.
Elasticsearch does not perform authentication or authorization, leaving that as an exercise for the developer. This article gives an overview of things to keep in mind when you configure the security settings for your Elasticsearch cluster, providing users with (limited) access to your cluster when you cannot necessarily (entirely) trust them.
One of the trickiest parts of integrating Elasticsearch into an existing app is figuring out how to manage the flow of data from an authoritative data source, such as an SQL database, into Elasticsearch. In most cases this means utilizing Elasticsearch's bulk API. It also implies designing an application to effectively make data available in an efficient, robust, on-time manner. This usually requires modifying an application's workflow to replicate data in batches. This article is a survey of various patterns for accomplishing this task.
Elasticsearch easily lets you develop amazing things, and it has gone to great lengths to make Lucene's features readily available in a distributed setting. However, when it comes to running Elasticsearch in production, you still have a fairly complicated system on your hands: a system with high demands on network stability, a huge appetite for memory, and a system that assumes all users are trustworthy. These articles cover some of the lessons we've learned from securing and herding hundreds of Elasticsearch clusters.
Can Elasticsearch be used as a "NoSQL"-database? NoSQL means different things in different contexts, and interestingly it's not really about SQL. We will start out with a "Maybe!", and look into the various properties of Elasticsearch as well as those it has sacrificed, in order to become one of the most flexible, scalable and performant search and analytics engines yet.
I am not a programmer, so I endeavoured to understand what Found's developers were talking about by starting to read up on the basics of search engine indexing. Finding documents that describe search engine indexing was easy. Understanding them at a beginner's level was anything but. I therefore decided to set off writing articles that explain search engine indexing from a beginner's perspective.