copy-edit viget posts
This commit is contained in:
@@ -2,7 +2,6 @@
|
||||
title: "Let’s Write a Dang ElasticSearch Plugin"
|
||||
date: 2021-03-15T00:00:00+00:00
|
||||
draft: false
|
||||
needs_review: true
|
||||
canonical_url: https://www.viget.com/articles/lets-write-a-dang-elasticsearch-plugin/
|
||||
---
|
||||
|
||||
@@ -11,39 +10,37 @@ to search a large collection of news items. Some of the conditionals
|
||||
fall outside of the sweet spot of Postgres (e.g. word X must appear
|
||||
within Y words of word Z), and so we opted to pull in
|
||||
[ElasticSearch](https://www.elastic.co/elasticsearch/) alongside it.
|
||||
It\'s worked perfectly, hitting all of our condition and grouping needs
|
||||
It's worked perfectly, hitting all of our condition and grouping needs
|
||||
with one exception: we need to be able to filter for articles that
|
||||
contain a term a minimum number of times (so \"Apple\" must appear in
|
||||
contain a term a minimum number of times (so "Apple" must appear in
|
||||
the article 3 times, for example). Frustratingly, Elastic *totally* has
|
||||
this information via its
|
||||
[`term_vector`](https://www.elastic.co/guide/en/elasticsearch/reference/current/term-vector.html)
|
||||
feature, but you can\'t use that data inside a query, as least as far as
|
||||
feature, but you can't use that data inside a query, as least as far as
|
||||
I can tell.
|
||||
|
||||
The solution, it seems, is to write a custom plugin. I figured it out,
|
||||
eventually, but it was a lot of trial-and-error as the documentation I
|
||||
was able to find is largely outdated or incomplete. So I figured I\'d
|
||||
take what I learned while it\'s still fresh in my mind in the hopes that
|
||||
someone else might have an easier time of it. That\'s what internet
|
||||
was able to find is largely outdated or incomplete. So I figured I'd
|
||||
take what I learned while it's still fresh in my mind in the hopes that
|
||||
someone else might have an easier time of it. That's what internet
|
||||
friends are for, after all.
|
||||
|
||||
Quick note before we start: all the version numbers you see are current
|
||||
and working as of February 25, 2021. Hopefully this post ages well, but
|
||||
if you try this out and hit issues, bumping the versions of Elastic,
|
||||
Gradle, and maybe even Java is probably a good place to start. Also, I
|
||||
use `projectname` a lot in the code examples --- that\'s not a special
|
||||
use `projectname` a lot in the code examples --- that's not a special
|
||||
word and you should change it to something that makes sense for you.
|
||||
|
||||
[]{#1-set-up-a-java-development-environment}
|
||||
## 1. Set up a Java development environment
|
||||
|
||||
## 1. Set up a Java development environment [\#](#1-set-up-a-java-development-environment "Direct link to 1. Set up a Java development environment"){.anchor aria-label="Direct link to 1. Set up a Java development environment"}
|
||||
|
||||
First off, you\'re gonna be writing some Java. That\'s not my usual
|
||||
First off, you're gonna be writing some Java. That's not my usual
|
||||
thing, so the first step was to get a working environment to compile my
|
||||
code. To do that, we\'ll use [Docker](https://www.docker.com/). Here\'s
|
||||
code. To do that, we'll use [Docker](https://www.docker.com/). Here's
|
||||
a `Dockerfile`:
|
||||
|
||||
``` {.code-block .line-numbers}
|
||||
```dockerfile
|
||||
FROM adoptopenjdk/openjdk12:jdk-12.0.2_10-ubuntu
|
||||
|
||||
RUN apt-get update &&
|
||||
@@ -70,17 +67,15 @@ your local working directory into `/plugin`:
|
||||
|
||||
`> docker run --rm -it -v ${PWD}:/plugin projectname-java bash`
|
||||
|
||||
[]{#2-configure-gradle}
|
||||
## 2. Configure Gradle
|
||||
|
||||
## 2. Configure Gradle [\#](#2-configure-gradle "Direct link to 2. Configure Gradle"){.anchor aria-label="Direct link to 2. Configure Gradle"}
|
||||
|
||||
[Gradle](https://gradle.org/) is a \"build automation tool for
|
||||
multi-language software development,\" and what Elastic recommends for
|
||||
[Gradle](https://gradle.org/) is a "build automation tool for
|
||||
multi-language software development," and what Elastic recommends for
|
||||
plugin development. Configuring Gradle to build the plugin properly was
|
||||
the hardest part of this whole endeavor. Throw this into `build.gradle`
|
||||
in your project root:
|
||||
|
||||
``` {.code-block .line-numbers}
|
||||
```gradle
|
||||
buildscript {
|
||||
repositories {
|
||||
mavenLocal()
|
||||
@@ -116,28 +111,26 @@ esplugin {
|
||||
validateNebulaPom.enabled = false
|
||||
```
|
||||
|
||||
You\'ll also need files named `LICENSE.txt` and `NOTICE.txt` --- mine
|
||||
are empty, since the plugin is for internal use only. If you\'re going
|
||||
You'll also need files named `LICENSE.txt` and `NOTICE.txt` --- mine
|
||||
are empty, since the plugin is for internal use only. If you're going
|
||||
to be releasing your plugin in some public way, maybe talk to a lawyer
|
||||
about what to put in those files.
|
||||
|
||||
[]{#3-write-the-dang-plugin}
|
||||
|
||||
## 3. Write the dang plugin [\#](#3-write-the-dang-plugin "Direct link to 3. Write the dang plugin"){.anchor aria-label="Direct link to 3. Write the dang plugin"}
|
||||
## 3. Write the dang plugin
|
||||
|
||||
To write the actual plugin, I started with [this example
|
||||
plugin](https://github.com/elastic/elasticsearch/blob/master/plugins/examples/script-expert-scoring/src/main/java/org/elasticsearch/example/expertscript/ExpertScriptPlugin.java)
|
||||
which scores a document based on the frequency of a given term. My use
|
||||
case was fortunately quite similar, though I\'m using a `filter` query,
|
||||
case was fortunately quite similar, though I'm using a `filter` query,
|
||||
meaning I just want a boolean, i.e. does this document contain this term
|
||||
the requisite number of times? As such, I implemented a
|
||||
[`FilterScript`](https://www.javadoc.io/doc/org.elasticsearch/elasticsearch/latest/org/elasticsearch/script/FilterScript.html)
|
||||
rather than the `ScoreScript` implemented in the example code.
|
||||
|
||||
This file lives in (deep breath)
|
||||
`src/main/java/com/projectname/containsmultiple/ContainsMultiplePlugin.java`:
|
||||
`src/main/java/com/projectname/` `containsmultiple/ContainsMultiplePlugin.java`:
|
||||
|
||||
``` {.code-block .line-numbers}
|
||||
```java
|
||||
package com.projectname.containsmultiple;
|
||||
|
||||
import org.apache.lucene.index.LeafReaderContext;
|
||||
@@ -311,26 +304,24 @@ public class ContainsMultiplePlugin extends Plugin implements ScriptPlugin {
|
||||
}
|
||||
```
|
||||
|
||||
[]{#4-add-it-to-elasticSearch}
|
||||
|
||||
## 4. Add it to ElasticSearch [\#](#4-add-it-to-elasticSearch "Direct link to 4. Add it to ElasticSearch"){.anchor aria-label="Direct link to 4. Add it to ElasticSearch"}
|
||||
## 4. Add it to ElasticSearch
|
||||
|
||||
With our code in place (and synced into our Docker container with a
|
||||
mounted volume), it\'s time to compile it. In the Docker shell you
|
||||
mounted volume), it's time to compile it. In the Docker shell you
|
||||
started up in step #1, build your plugin:
|
||||
|
||||
`> gradle build`
|
||||
|
||||
Assuming that works, you should now see a `build` directory with a bunch
|
||||
of stuff in it. The file you care about is
|
||||
`build/distributions/contains-multiple-0.0.1.zip` (though that\'ll
|
||||
`build/distributions/contains-multiple-0.0.1.zip` (though that'll
|
||||
obviously change if you call your plugin something different or give it
|
||||
a different version number). Grab that file and copy it to where you
|
||||
plan to actually run ElasticSearch. For me, I placed it in a folder
|
||||
called `.docker/elastic` in the main project repo. In that same
|
||||
directory, create a new `Dockerfile` that\'ll actually run Elastic:
|
||||
directory, create a new `Dockerfile` that'll actually run Elastic:
|
||||
|
||||
``` {.code-block .line-numbers}
|
||||
```dockerfile
|
||||
FROM docker.elastic.co/elasticsearch/elasticsearch:7.11.1
|
||||
|
||||
COPY .docker/elastic/contains-multiple-0.0.1.zip /plugins/contains-multiple-0.0.1.zip
|
||||
@@ -341,37 +332,35 @@ RUN elasticsearch-plugin install
|
||||
|
||||
Then, in your project root, create the following `docker-compose.yml`:
|
||||
|
||||
``` {.code-block .line-numbers}
|
||||
```yaml
|
||||
version: '3.2'
|
||||
|
||||
services: elasticsearch:
|
||||
image: projectname_elasticsearch
|
||||
build:
|
||||
context: .
|
||||
dockerfile: ./.docker/elastic/Dockerfile
|
||||
ports:
|
||||
- 9200:9200
|
||||
environment:
|
||||
- discovery.type=single-node
|
||||
- script.allowed_types=inline
|
||||
- script.allowed_contexts=filter
|
||||
image: projectname_elasticsearch
|
||||
build:
|
||||
context: .
|
||||
dockerfile: ./.docker/elastic/Dockerfile
|
||||
ports:
|
||||
- 9200:9200
|
||||
environment:
|
||||
- discovery.type=single-node
|
||||
- script.allowed_types=inline
|
||||
- script.allowed_contexts=filter
|
||||
```
|
||||
|
||||
Those last couple lines are pretty important and your script won\'t work
|
||||
Those last couple lines are pretty important and your script won't work
|
||||
without them. Build your image with `docker-compose build` and then
|
||||
start Elastic with `docker-compose up`.
|
||||
|
||||
[]{#5-use-your-plugin}
|
||||
|
||||
## 5. Use your plugin [\#](#5-use-your-plugin "Direct link to 5. Use your plugin"){.anchor aria-label="Direct link to 5. Use your plugin"}
|
||||
## 5. Use your plugin
|
||||
|
||||
To actually see the plugin in action, first create an index and add some
|
||||
documents (I\'ll assume you\'re able to do this if you\'ve read this far
|
||||
documents (I'll assume you're able to do this if you've read this far
|
||||
into this post). Then, make a query with `curl` (or your Elastic wrapper
|
||||
of choice), substituting `full_text`, `yabba` and `index_name` with
|
||||
whatever makes sense for you:
|
||||
|
||||
``` {.code-block .line-numbers}
|
||||
```
|
||||
> curl -H "content-type: application/json"
|
||||
-d '
|
||||
{
|
||||
@@ -398,7 +387,7 @@ whatever makes sense for you:
|
||||
|
||||
The result should be something like:
|
||||
|
||||
``` {.code-block .line-numbers}
|
||||
```json
|
||||
{
|
||||
"took" : 6,
|
||||
"timed_out" : false,
|
||||
@@ -422,6 +411,6 @@ The result should be something like:
|
||||
...
|
||||
```
|
||||
|
||||
So that\'s that, an ElasticSearch plugin from start-to-finish. I\'m sure
|
||||
there are better ways to do some of this stuff, and if you\'re aware of
|
||||
So that's that, an ElasticSearch plugin from start-to-finish. I'm sure
|
||||
there are better ways to do some of this stuff, and if you're aware of
|
||||
any, let us know in the comments or write your own dang blog.
|
||||
|
||||
Reference in New Issue
Block a user