copy-edit viget posts

This commit is contained in:
David Eisinger
2023-10-24 20:48:09 -04:00
parent 0438a6d828
commit f86f391e82
77 changed files with 1663 additions and 1380 deletions

View File

@@ -2,7 +2,6 @@
title: "Lets Write a Dang ElasticSearch Plugin"
date: 2021-03-15T00:00:00+00:00
draft: false
needs_review: true
canonical_url: https://www.viget.com/articles/lets-write-a-dang-elasticsearch-plugin/
---
@@ -11,39 +10,37 @@ to search a large collection of news items. Some of the conditionals
fall outside of the sweet spot of Postgres (e.g. word X must appear
within Y words of word Z), and so we opted to pull in
[ElasticSearch](https://www.elastic.co/elasticsearch/) alongside it.
It\'s worked perfectly, hitting all of our condition and grouping needs
It's worked perfectly, hitting all of our condition and grouping needs
with one exception: we need to be able to filter for articles that
contain a term a minimum number of times (so \"Apple\" must appear in
contain a term a minimum number of times (so "Apple" must appear in
the article 3 times, for example). Frustratingly, Elastic *totally* has
this information via its
[`term_vector`](https://www.elastic.co/guide/en/elasticsearch/reference/current/term-vector.html)
feature, but you can\'t use that data inside a query, as least as far as
feature, but you can't use that data inside a query, as least as far as
I can tell.
The solution, it seems, is to write a custom plugin. I figured it out,
eventually, but it was a lot of trial-and-error as the documentation I
was able to find is largely outdated or incomplete. So I figured I\'d
take what I learned while it\'s still fresh in my mind in the hopes that
someone else might have an easier time of it. That\'s what internet
was able to find is largely outdated or incomplete. So I figured I'd
take what I learned while it's still fresh in my mind in the hopes that
someone else might have an easier time of it. That's what internet
friends are for, after all.
Quick note before we start: all the version numbers you see are current
and working as of February 25, 2021. Hopefully this post ages well, but
if you try this out and hit issues, bumping the versions of Elastic,
Gradle, and maybe even Java is probably a good place to start. Also, I
use `projectname` a lot in the code examples --- that\'s not a special
use `projectname` a lot in the code examples --- that's not a special
word and you should change it to something that makes sense for you.
[]{#1-set-up-a-java-development-environment}
## 1. Set up a Java development environment
## 1. Set up a Java development environment [\#](#1-set-up-a-java-development-environment "Direct link to 1. Set up a Java development environment"){.anchor aria-label="Direct link to 1. Set up a Java development environment"}
First off, you\'re gonna be writing some Java. That\'s not my usual
First off, you're gonna be writing some Java. That's not my usual
thing, so the first step was to get a working environment to compile my
code. To do that, we\'ll use [Docker](https://www.docker.com/). Here\'s
code. To do that, we'll use [Docker](https://www.docker.com/). Here's
a `Dockerfile`:
``` {.code-block .line-numbers}
```dockerfile
FROM adoptopenjdk/openjdk12:jdk-12.0.2_10-ubuntu
RUN apt-get update &&
@@ -70,17 +67,15 @@ your local working directory into `/plugin`:
`> docker run --rm -it -v ${PWD}:/plugin projectname-java bash`
[]{#2-configure-gradle}
## 2. Configure Gradle
## 2. Configure Gradle [\#](#2-configure-gradle "Direct link to 2. Configure Gradle"){.anchor aria-label="Direct link to 2. Configure Gradle"}
[Gradle](https://gradle.org/) is a \"build automation tool for
multi-language software development,\" and what Elastic recommends for
[Gradle](https://gradle.org/) is a "build automation tool for
multi-language software development," and what Elastic recommends for
plugin development. Configuring Gradle to build the plugin properly was
the hardest part of this whole endeavor. Throw this into `build.gradle`
in your project root:
``` {.code-block .line-numbers}
```gradle
buildscript {
repositories {
mavenLocal()
@@ -116,28 +111,26 @@ esplugin {
validateNebulaPom.enabled = false
```
You\'ll also need files named `LICENSE.txt` and `NOTICE.txt` --- mine
are empty, since the plugin is for internal use only. If you\'re going
You'll also need files named `LICENSE.txt` and `NOTICE.txt` --- mine
are empty, since the plugin is for internal use only. If you're going
to be releasing your plugin in some public way, maybe talk to a lawyer
about what to put in those files.
[]{#3-write-the-dang-plugin}
## 3. Write the dang plugin [\#](#3-write-the-dang-plugin "Direct link to 3. Write the dang plugin"){.anchor aria-label="Direct link to 3. Write the dang plugin"}
## 3. Write the dang plugin
To write the actual plugin, I started with [this example
plugin](https://github.com/elastic/elasticsearch/blob/master/plugins/examples/script-expert-scoring/src/main/java/org/elasticsearch/example/expertscript/ExpertScriptPlugin.java)
which scores a document based on the frequency of a given term. My use
case was fortunately quite similar, though I\'m using a `filter` query,
case was fortunately quite similar, though I'm using a `filter` query,
meaning I just want a boolean, i.e. does this document contain this term
the requisite number of times? As such, I implemented a
[`FilterScript`](https://www.javadoc.io/doc/org.elasticsearch/elasticsearch/latest/org/elasticsearch/script/FilterScript.html)
rather than the `ScoreScript` implemented in the example code.
This file lives in (deep breath)
`src/main/java/com/projectname/containsmultiple/ContainsMultiplePlugin.java`:
`src/main/java/com/projectname/` `containsmultiple/ContainsMultiplePlugin.java`:
``` {.code-block .line-numbers}
```java
package com.projectname.containsmultiple;
import org.apache.lucene.index.LeafReaderContext;
@@ -311,26 +304,24 @@ public class ContainsMultiplePlugin extends Plugin implements ScriptPlugin {
}
```
[]{#4-add-it-to-elasticSearch}
## 4. Add it to ElasticSearch [\#](#4-add-it-to-elasticSearch "Direct link to 4. Add it to ElasticSearch"){.anchor aria-label="Direct link to 4. Add it to ElasticSearch"}
## 4. Add it to ElasticSearch
With our code in place (and synced into our Docker container with a
mounted volume), it\'s time to compile it. In the Docker shell you
mounted volume), it's time to compile it. In the Docker shell you
started up in step #1, build your plugin:
`> gradle build`
Assuming that works, you should now see a `build` directory with a bunch
of stuff in it. The file you care about is
`build/distributions/contains-multiple-0.0.1.zip` (though that\'ll
`build/distributions/contains-multiple-0.0.1.zip` (though that'll
obviously change if you call your plugin something different or give it
a different version number). Grab that file and copy it to where you
plan to actually run ElasticSearch. For me, I placed it in a folder
called `.docker/elastic` in the main project repo. In that same
directory, create a new `Dockerfile` that\'ll actually run Elastic:
directory, create a new `Dockerfile` that'll actually run Elastic:
``` {.code-block .line-numbers}
```dockerfile
FROM docker.elastic.co/elasticsearch/elasticsearch:7.11.1
COPY .docker/elastic/contains-multiple-0.0.1.zip /plugins/contains-multiple-0.0.1.zip
@@ -341,37 +332,35 @@ RUN elasticsearch-plugin install
Then, in your project root, create the following `docker-compose.yml`:
``` {.code-block .line-numbers}
```yaml
version: '3.2'
services: elasticsearch:
image: projectname_elasticsearch
build:
context: .
dockerfile: ./.docker/elastic/Dockerfile
ports:
- 9200:9200
environment:
- discovery.type=single-node
- script.allowed_types=inline
- script.allowed_contexts=filter
image: projectname_elasticsearch
build:
context: .
dockerfile: ./.docker/elastic/Dockerfile
ports:
- 9200:9200
environment:
- discovery.type=single-node
- script.allowed_types=inline
- script.allowed_contexts=filter
```
Those last couple lines are pretty important and your script won\'t work
Those last couple lines are pretty important and your script won't work
without them. Build your image with `docker-compose build` and then
start Elastic with `docker-compose up`.
[]{#5-use-your-plugin}
## 5. Use your plugin [\#](#5-use-your-plugin "Direct link to 5. Use your plugin"){.anchor aria-label="Direct link to 5. Use your plugin"}
## 5. Use your plugin
To actually see the plugin in action, first create an index and add some
documents (I\'ll assume you\'re able to do this if you\'ve read this far
documents (I'll assume you're able to do this if you've read this far
into this post). Then, make a query with `curl` (or your Elastic wrapper
of choice), substituting `full_text`, `yabba` and `index_name` with
whatever makes sense for you:
``` {.code-block .line-numbers}
```
> curl -H "content-type: application/json"
-d '
{
@@ -398,7 +387,7 @@ whatever makes sense for you:
The result should be something like:
``` {.code-block .line-numbers}
```json
{
"took" : 6,
"timed_out" : false,
@@ -422,6 +411,6 @@ The result should be something like:
...
```
So that\'s that, an ElasticSearch plugin from start-to-finish. I\'m sure
there are better ways to do some of this stuff, and if you\'re aware of
So that's that, an ElasticSearch plugin from start-to-finish. I'm sure
there are better ways to do some of this stuff, and if you're aware of
any, let us know in the comments or write your own dang blog.