Elasticsearch 使ってみる
January 24, 2021
Elasticsearch とは
Elasticsearch は Elastic 社によって開発されている検索エンジンです。
最近、ライセンスの変更を行うというアナウンスがありました。Apache License, Version 2.0 (ALv2) から、Server Side Public License (SSPL) と Elastic License のデュアルライセンスへ変更するそうです。
新たにデュアルライセンスで採用される SSPL は、MongoDB が AWS など大手クラウドベンダによるオープンソースのサービス化に反発し、商用サービス化を制限するために作成したライセンスです。
- Ref:
上記のように、MongoDB と同じ方針をとるようです。
それに対して、AWS は fork してオープンソース版を開発することを発表しています。
- Ref:
このあたりの、Elastic 社 vs AWS のゴシップっぽい話題は下記のサイトとかに記載があります。
オープンソースの著作権を巡る Elastic と Amazon との闘い
Managed Service の料金
Elasticsearch の Managed Service は、AWS だと free tier があるのですが、本家の Elastic Cloud は free tier がないんですよね。。
AWS の無料枠は以下。
AWS 無料利用枠を使用すれば、無料で Amazon Elasticsearch Service の使用を開始できます。AWS 無料利用枠をご利用のお客様は、Amazon Elasticsearch Service で t2.micro.elasticsearch または t3.small.elasticsearch インスタンスを 1 か月あたり最大 750 時間、さらにオプションで 1 か月あたり 10 GB の Amazon EBS ストレージ (マグネティックまたは汎用) を無料で利用できます。無料利用枠の制限を超過した場合、使用した追加のリソースに対して Amazon Elasticsearch Service の料金が発生します。詳細は、提供規約をご覧ください。
Ref: 料金 - Amazon Elasticsearch Service | AWS
Docker で試してみる
公式で docker image が提供されているので、これを使ってみます。
Install Elasticsearch with Docker | Elasticsearch Reference [7.10] | Elastic
docker image を pull します。
下記のイメージは、Elastic license で提供されているもので、オープンソースと無償で使用できる商用の機能と 30 日の試用期間がある有償の機能が含まれます。
docker pull docker.elastic.co/elasticsearch/elasticsearch:7.10.2
Apache 2.0 license 版のイメージは www.docker.elastic.co で利用できるとのこと。
シングルノードで起動
シングルノードで起動します。
docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.10.2
起動されているノードを確認
$ curl -X GET "localhost:9200/_cat/nodes?v=true&pretty"ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name172.17.0.2 41 96 11 0.31 0.08 0.03 cdhilmrstw * a7d579883d6a
Kibana の docker image を pull
docker pull docker.elastic.co/kibana/kibana:7.10.2
docker run --link YOUR_ELASTICSEARCH_CONTAINER_NAME_OR_ID:elasticsearch -p 5601:5601 docker.elastic.co/kibana/kibana:7.10.2
YOUR_ELASTICSEARCH_CONTAINER_NAME_OR_ID
に Elasticsearch を動かしている docker コンテナの ID を指定します。
下記のように docker ps で確認します。
$ docker psCONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES6a3f8789836f docker.elastic.co/elasticsearch/elasticsearch:7.10.2 "/tini -- /usr/local…" 2 minutes ago Up 2 minutes 0.0.0.0:9200->9200/tcp, 0.0.0.0:9300->9300/tcp sad_borg
docker run --link 6a3f8789836f:elasticsearch -p 5601:5601 docker.elastic.co/kibana/kibana:7.10.2
下記 URL にブラウザでアクセスすることで Kibana コンソールを開くことができます。
http://localhost:5601/app/kibana#/dev_tools/console?load_from=https://www.elastic.co/guide/en/elasticsearch/reference/current/snippets/6.console
データの投入
ここからは下記のドキュメントの手順を追います。
Start searching | Elasticsearch Reference [7.10] | Elastic
下記のように curl で json ドキュメントを Elasticsearch に put できます。
curl -X PUT "localhost:9200/customer/_doc/1?pretty" -H 'Content-Type: application/json' -d'{"name": "John Doe"}'
下記のようにレスポンスが返ってきます。
$ curl -X PUT "localhost:9200/customer/_doc/1?pretty" -H 'Content-Type: application/json' -d'> {> "name": "John Doe"> }> '{"_index" : "customer","_type" : "_doc","_id" : "1","_version" : 1,"result" : "created","_shards" : {"total" : 2,"successful" : 1,"failed" : 0},"_seq_no" : 0,"_primary_term" : 1}
データの確認
curl -X GET "localhost:9200/customer/_doc/1?pretty"
$ curl -X GET "localhost:9200/customer/_doc/1?pretty"{"_index" : "customer","_type" : "_doc","_id" : "1","_version" : 1,"_seq_no" : 0,"_primary_term" : 1,"found" : true,"_source" : {"name" : "John Doe"}}
複数のドキュメントを一気に put する
公式チュートリアルで提供されている下記のサンプル json ファイルを使います。
ダウンロードする。
curl -L -O https://raw.githubusercontent.com/elastic/elasticsearch/master/docs/src/test/resources/accounts.json
データの内容は下記のようになっています。
$ head accounts.json{"index":{"_id":"1"}}{"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}{"index":{"_id":"6"}}{"account_number":6,"balance":5686,"firstname":"Hattie","lastname":"Bond","age":36,"gender":"M","address":"671 Bristol Street","employer":"Netagy","email":"hattiebond@netagy.com","city":"Dante","state":"TN"}{"index":{"_id":"13"}}{"account_number":13,"balance":32838,"firstname":"Nanette","lastname":"Bates","age":28,"gender":"F","address":"789 Madison Street","employer":"Quility","email":"nanettebates@quility.com","city":"Nogal","state":"VA"}{"index":{"_id":"18"}}{"account_number":18,"balance":4180,"firstname":"Dale","lastname":"Adams","age":33,"gender":"M","address":"467 Hutchinson Court","employer":"Boink","email":"daleadams@boink.com","city":"Orick","state":"MD"}{"index":{"_id":"20"}}{"account_number":20,"balance":16418,"firstname":"Elinor","lastname":"Ratliff","age":36,"gender":"M","address":"282 Kings Place","employer":"Scentric","email":"elinorratliff@scentric.com","city":"Ribera","state":"WA"}
1行目で index id、2 行目でデータの内容の2行で 1 セットになっているようです。
下記のように 2000 行あるので、1000 件のデータが含まれていま す。
$ wc -l accounts.json2000 accounts.json
下記のように _bulk
リクエストで json ファイルを一度にアップロードする。
curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_bulk?pretty&refresh" --data-binary "@accounts.json"
下記コマンドで 1000 件のドキュメントが正常にインデックスされたことを確認します。
curl "localhost:9200/_cat/indices?v=true"
index が bank の行の、docs.count が 1000 になっています。
$ curl "localhost:9200/_cat/indices?v=true"health status index uuid pri rep docs.count docs.deleted store.size pri.store.sizeyellow open bank 9ounr8kaSOyJg3JaGsoImQ 1 1 1000 0 379.2kb 379.2kbgreen open .apm-custom-link jlYHKkSkTyGZGN4zdD-wjg 1 0 0 0 208b 208bgreen open .kibana_task_manager_1 6EpHdzs_Q6qAQE3KLPnXtA 1 0 5 87 53.2kb 53.2kbgreen open .apm-agent-configuration asLwTTn5QeSf4pb9hXCg_g 1 0 0 0 208b 208bgreen open .kibana_1 BGDGmn40SFevoz3-yCp_fw 1 0 43 10 2.1mb 2.1mbgreen open .kibana-event-log-7.10.2-000001 3mtucZvZT9uKK2huboPwfg 1 0 1 0 5.6kb 5.6kbyellow open customer Ahd0yuTvSO2cnd2jPHjNTQ 1 1 1 0 3.8kb 3.8kb
検索する
_search
エンドポイントを叩くことでドキュメントを検索することができます。
下記のようにクエリを発行します。
下記の例では、index に bank を指定し、アカウント番号でソートした結果を取得します。
curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'{"query": { "match_all": {} },"sort": [{ "account_number": "asc" }]}'
デフォルトでは、検索クエリにヒットした最初の 10 件のみ結果が返されます。
$ curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'> {> "query": { "match_all": {} },> "sort": [> { "account_number": "asc" }> ]> }> '{"took" : 8,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 1000,"relation" : "eq"},"max_score" : null,"hits" : [{"_index" : "bank","_type" : "_doc","_id" : "0","_score" : null,"_source" : {"account_number" : 0,"balance" : 16623,"firstname" : "Bradshaw","lastname" : "Mckenzie","age" : 29,"gender" : "F","address" : "244 Columbus Place","employer" : "Euron","email" : "bradshawmckenzie@euron.com","city" : "Hobucken","state" : "CO"},"sort" : [0]},...]}}
レスポンスには下記の情報も含まれています。
- took – Elasticsearch クエリを処理するのにかかった時間 (msec 単位)
- timed_out – 検索リクエストがタイムアウトしたかどうかの bool (true or false)
- _shards – 検索された shards の数と、成功、失敗、スキップされたシャードの数の内訳
- max_score – 最も関連性があるとされたドキュメントのスコア
- hits.total.value - クエリにマッチしたドキュメントの数
- hits.sort - ソートされた時の、マッチしたドキュメントの位置 (関連度 relevance score でソートされていない場合)
- hits._score - マッチしたドキュメントの relevance score (match_all が使われている場合は利用されない)
クエリ結果を順番に取得するには from
と size
パラメータをリクエストに含めて指定します。
例えば、10 番目から 19 番目のドキュメントを取得するには下記のようにします。
curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'{"query": { "match_all": {} },"sort": [{ "account_number": "asc" }],"from": 10,"size": 10}'
特定の項目でヒットするドキュメントを検索する
特定の項目でヒットするドキュメントを検索するには、 match
クエリを使います。
以下は、 address
フィールドが mill
または lane
という文字列を含むカスタマーを検索する例です。
curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'{"query": { "match": { "address": "mill lane" } }}'
関連度の高い順番から取得されています。
$ curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'> {> "query": { "match": { "address": "mill lane" } }> }> '{"took" : 7,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 19,"relation" : "eq"},"max_score" : 9.507477,"hits" : [{"_index" : "bank","_type" : "_doc","_id" : "136","_score" : 9.507477,"_source" : {"account_number" : 136,"balance" : 45801,"firstname" : "Winnie","lastname" : "Holland","age" : 38,"gender" : "M","address" : "198 Mill Lane","employer" : "Neteria","email" : "winnieholland@neteria.com","city" : "Urie","state" : "IL"}},{"_index" : "bank","_type" : "_doc","_id" : "970","_score" : 5.4032025,"_source" : {"account_number" : 970,"balance" : 19648,"firstname" : "Forbes","lastname" : "Wallace","age" : 28,"gender" : "M","address" : "990 Mill Road","employer" : "Pheast","email" : "forbeswallace@pheast.com","city" : "Lopezo","state" : "AK"}},{"_index" : "bank","_type" : "_doc","_id" : "345","_score" : 5.4032025,"_source" : {"account_number" : 345,"balance" : 9812,"firstname" : "Parker","lastname" : "Hines","age" : 38,"gender" : "M","address" : "715 Mill Avenue","employer" : "Baluba","email" : "parkerhines@baluba.com","city" : "Blackgum","state" : "KY"}},{"_index" : "bank","_type" : "_doc","_id" : "472","_score" : 5.4032025,"_source" : {"account_number" : 472,"balance" : 25571,"firstname" : "Lee","lastname" : "Long","age" : 32,"gender" : "F","address" : "288 Mill Street","employer" : "Comverges","email" : "leelong@comverges.com","city" : "Movico","state" : "MT"}},{"_index" : "bank","_type" : "_doc","_id" : "1","_score" : 4.1042743,"_source" : {"account_number" : 1,"balance" : 39225,"firstname" : "Amber","lastname" : "Duke","age" : 32,"gender" : "M","address" : "880 Holmes Lane","employer" : "Pyrami","email" : "amberduke@pyrami.com","city" : "Brogan","state" : "IL"}},{"_index" : "bank","_type" : "_doc","_id" : "70","_score" : 4.1042743,"_source" : {"account_number" : 70,"balance" : 38172,"firstname" : "Deidre","lastname" : "Thompson","age" : 33,"gender" : "F","address" : "685 School Lane","employer" : "Netplode","email" : "deidrethompson@netplode.com","city" : "Chestnut","state" : "GA"}},{"_index" : "bank","_type" : "_doc","_id" : "556","_score" : 4.1042743,"_source" : {"account_number" : 556,"balance" : 36420,"firstname" : "Collier","lastname" : "Odonnell","age" : 35,"gender" : "M","address" : "591 Nolans Lane","employer" : "Sultraxin","email" : "collierodonnell@sultraxin.com","city" : "Fulford","state" : "MD"}},{"_index" : "bank","_type" : "_doc","_id" : "568","_score" : 4.1042743,"_source" : {"account_number" : 568,"balance" : 36628,"firstname" : "Lesa","lastname" : "Maynard","age" : 29,"gender" : "F","address" : "295 Whitty Lane","employer" : "Coash","email" : "lesamaynard@coash.com","city" : "Broadlands","state" : "VT"}},{"_index" : "bank","_type" : "_doc","_id" : "715","_score" : 4.1042743,"_source" : {"account_number" : 715,"balance" : 23734,"firstname" : "Tammi","lastname" : "Hodge","age" : 24,"gender" : "M","address" : "865 Church Lane","employer" : "Netur","email" : "tammihodge@netur.com","city" : "Lacomb","state" : "KS"}},{"_index" : "bank","_type" : "_doc","_id" : "449","_score" : 4.1042743,"_source" : {"account_number" : 449,"balance" : 41950,"firstname" : "Barnett","lastname" : "Cantrell","age" : 39,"gender" : "F","address" : "945 Bedell Lane","employer" : "Zentility","email" : "barnettcantrell@zentility.com","city" : "Swartzville","state" : "ND"}}]}}
個別の単語ごとにマッチさせるのではなく、 mill lane
というフレーズ全体でマッチさせるためには下記のようにします。
curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'{"query": { "match_phrase": { "address": "mill lane" } }}'
より複雑なクエリを発行するためには、bool query を使って複数の条件を組み合わせることができます。 must
は必ずマッチする必要がある条件、 should
はマッチして欲しい条件、 must_not
にはマッチした場合は結果から除外する条件を指定します。
下記の例では、bank index の中から 40 歳のカスタマーだが、Idaho に住んでいない人のみを検索します。
curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'{"query": {"bool": {"must": [{ "match": { "age": "40" } }],"must_not": [{ "match": { "state": "ID" } }]}}}'
must
と should
は relevance scores に影響します。
デフォルトでは relevance scores でランク付けされた順番で結果が返されます。
must_not
はフィルターとして扱われます。
またフイルターとして明示的に、含めるドキュメント、除外するドキュメントを指定できます。
下 記の例は、balance が $20,000 から $30,000 の間にあるカスタマーのみを抽出する範囲指定のフィルターを使っています。
curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'{"query": {"bool": {"must": { "match_all": {} },"filter": {"range": {"balance": {"gte": 20000,"lte": 30000}}}}}}'
$ curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'> {> "query": {> "bool": {> "must": { "match_all": {} },> "filter": {> "range": {> "balance": {> "gte": 20000,> "lte": 30000> }> }> }> }> }> }> '{"took" : 3,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 217,"relation" : "eq"},"max_score" : 1.0,"hits" : [{"_index" : "bank","_type" : "_doc","_id" : "49","_score" : 1.0,"_source" : {"account_number" : 49,"balance" : 29104,"firstname" : "Fulton","lastname" : "Holt","age" : 23,"gender" : "F","address" : "451 Humboldt Street","employer" : "Anocha","email" : "fultonholt@anocha.com","city" : "Sunriver","state" : "RI"}},{"_index" : "bank","_type" : "_doc","_id" : "102","_score" : 1.0,"_source" : {"account_number" : 102,"balance" : 29712,"firstname" : "Dena","lastname" : "Olson","age" : 27,"gender" : "F","address" : "759 Newkirk Avenue","employer" : "Hinway","email" : "denaolson@hinway.com","city" : "Choctaw","state" : "NJ"}},{"_index" : "bank","_type" : "_doc","_id" : "133","_score" : 1.0,"_source" : {"account_number" : 133,"balance" : 26135,"firstname" : "Deena","lastname" : "Richmond","age" : 36,"gender" : "F","address" : "646 Underhill Avenue","employer" : "Sunclipse","email" : "deenarichmond@sunclipse.com","city" : "Austinburg","state" : "SC"}},{"_index" : "bank","_type" : "_doc","_id" : "140","_score" : 1.0,"_source" : {"account_number" : 140,"balance" : 26696,"firstname" : "Cotton","lastname" : "Christensen","age" : 32,"gender" : "M","address" : "878 Schermerhorn Street","employer" : "Prowaste","email" : "cottonchristensen@prowaste.com","city" : "Mayfair","state" : "LA"}},{"_index" : "bank","_type" : "_doc","_id" : "203","_score" : 1.0,"_source" : {"account_number" : 203,"balance" : 21890,"firstname" : "Eve","lastname" : "Wyatt","age" : 33,"gender" : "M","address" : "435 Furman Street","employer" : "Assitia","email" : "evewyatt@assitia.com","city" : "Jamestown","state" : "MN"}},{"_index" : "bank","_type" : "_doc","_id" : "239","_score" : 1.0,"_source" : {"account_number" : 239,"balance" : 25719,"firstname" : "Chang","lastname" : "Boyer","age" : 36,"gender" : "M","address" : "895 Brigham Street","employer" : "Qaboos","email" : "changboyer@qaboos.com","city" : "Belgreen","state" : "NH"}},{"_index" : "bank","_type" : "_doc","_id" : "241","_score" : 1.0,"_source" : {"account_number" : 241,"balance" : 25379,"firstname" : "Schroeder","lastname" : "Harrington","age" : 26,"gender" : "M","address" : "610 Tapscott Avenue","employer" : "Otherway","email" : "schroederharrington@otherway.com","city" : "Ebro","state" : "TX"}},{"_index" : "bank","_type" : "_doc","_id" : "246","_score" : 1.0,"_source" : {"account_number" : 246,"balance" : 28405,"firstname" : "Katheryn","lastname" : "Foster","age" : 21,"gender" : "F","address" : "259 Kane Street","employer" : "Quantalia","email" : "katherynfoster@quantalia.com","city" : "Bath","state" : "TX"}},{"_index" : "bank","_type" : "_doc","_id" : "253","_score" : 1.0,"_source" : {"account_number" : 253,"balance" : 20240,"firstname" : "Melissa","lastname" : "Gould","age" : 31,"gender" : "M","address" : "440 Fuller Place","employer" : "Buzzopia","email" : "melissagould@buzzopia.com","city" : "Lumberton","state" : "MD"}},{"_index" : "bank","_type" : "_doc","_id" : "277","_score" : 1.0,"_source" : {"account_number" : 277,"balance" : 29564,"firstname" : "Romero","lastname" : "Lott","age" : 31,"gender" : "M","address" : "456 Danforth Street","employer" : "Plasto","email" : "romerolott@plasto.com","city" : "Vincent","state" : "VT"}}]}}
ここから先は、下記のドキュメントに記載の内容に従い、集計を行って結果を分析する方法を学びます。
Analyze results with aggregations | Elasticsearch Reference [7.10] | Elastic
集計を行うことで、例えば、テキサス州に住んでいるカスタマーの数やテネシー州にいる顧客アカウントの残高の平均値がわかるようになります。
ドキュメントを検索し、フィルターをかけた上で、aggregations を行い1つのリクエストで取得した結果の分析を行えます。
下記の例では、州ごとに全てのアカウントをグループ分けして、アカウントの数が多い順に 10 件の州を取得します。
curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'{"size": 0,"aggs": {"group_by_state": {"terms": {"field": "state.keyword"}}}}'
$ curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'> {> "size": 0,> "aggs": {> "group_by_state": {> "terms": {> "field": "state.keyword"> }> }> }> }> '{"took" : 6,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 1000,"relation" : "eq"},"max_score" : null,"hits" : [ ]},"aggregations" : {"group_by_state" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 743,"buckets" : [{"key" : "TX","doc_count" : 30},{"key" : "MD","doc_count" : 28},{"key" : "ID","doc_count" : 27},{"key" : "AL","doc_count" : 25},{"key" : "ME","doc_count" : 25},{"key" : "TN","doc_count" : 25},{"key" : "WY","doc_count" : 25},{"key" : "DC","doc_count" : 24},{"key" : "MA","doc_count" : 24},{"key" : "ND","doc_count" : 24}]}}}
aggregations を組み合わせることもできます。
下記の例では、グループ分けした後に、平均を求めています。
curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'{"size": 0,"aggs": {"group_by_state": {"terms": {"field": "state.keyword"},"aggs": {"average_balance": {"avg": {"field": "balance"}}}}}}'
$ curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'> {> "size": 0,> "aggs": {> "group_by_state": {> "terms": {> "field": "state.keyword"> },> "aggs": {> "average_balance": {> "avg": {> "field": "balance"> }> }> }> }> }> }> '{"took" : 10,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 1000,"relation" : "eq"},"max_score" : null,"hits" : [ ]},"aggregations" : {"group_by_state" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 743,"buckets" : [{"key" : "TX","doc_count" : 30,"average_balance" : {"value" : 26073.3}},{"key" : "MD","doc_count" : 28,"average_balance" : {"value" : 26161.535714285714}},{"key" : "ID","doc_count" : 27,"average_balance" : {"value" : 24368.777777777777}},{"key" : "AL","doc_count" : 25,"average_balance" : {"value" : 25739.56}},{"key" : "ME","doc_count" : 25,"average_balance" : {"value" : 21663.0}},{"key" : "TN","doc_count" : 25,"average_balance" : {"value" : 28365.4}},{"key" : "WY","doc_count" : 25,"average_balance" : {"value" : 21731.52}},{"key" : "DC","doc_count" : 24,"average_balance" : {"value" : 23180.583333333332}},{"key" : "MA","doc_count" : 24,"average_balance" : {"value" : 29600.333333333332}},{"key" : "ND","doc_count" : 24,"average_balance" : {"value" : 26577.333333333332}}]}}}
件数でソートする代わりに、term aggregation の内側で order
でソートするフィールドを指定できます。
$ curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'> {> "size": 0,> "aggs": {> "group_by_state": {> "terms": {> "field": "state.keyword",> "order": {> "average_balance": "desc"> }> },> "aggs": {> "average_balance": {> "avg": {> "field": "balance"> }> }> }> }> }> }> '{"took" : 19,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 1000,"relation" : "eq"},"max_score" : null,"hits" : [ ]},"aggregations" : {"group_by_state" : {"doc_count_error_upper_bound" : -1,"sum_other_doc_count" : 827,"buckets" : [{"key" : "CO","doc_count" : 14,"average_balance" : {"value" : 32460.35714285714}},{"key" : "NE","doc_count" : 16,"average_balance" : {"value" : 32041.5625}},{"key" : "AZ","doc_count" : 14,"average_balance" : {"value" : 31634.785714285714}},{"key" : "MT","doc_count" : 17,"average_balance" : {"value" : 31147.41176470588}},{"key" : "VA","doc_count" : 16,"average_balance" : {"value" : 30600.0625}},{"key" : "GA","doc_count" : 19,"average_balance" : {"value" : 30089.0}},{"key" : "MA","doc_count" : 24,"average_balance" : {"value" : 29600.333333333332}},{"key" : "IL","doc_count" : 22,"average_balance" : {"value" : 29489.727272727272}},{"key" : "NM","doc_count" : 14,"average_balance" : {"value" : 28792.64285714286}},{"key" : "LA","doc_count" : 17,"average_balance" : {"value" : 28791.823529411766}}]}}}
Elasticsearch は、例えば、時間、IP アドレス、地理データなどを分析するための特別な集計のための機能を提供しているそうです。パイプラインも提供されているそうです。
また、異常検知のための機械学習のような機能も提供されているとか。
さらに学びたい方は、下記のドキュメントをご覧ください。
Where to go from here | Elasticsearch Reference [7.10] | Elastic
まとめ
Elasticsearch がどんなものか、手を動かして試してみました。
ほとんど、公式チュートリアルをなぞりながら翻訳したみたいになってしまいました。
Kibana も立ち上げてみましたが、結局あまり触れませんでした。
次回は実際にサービス(例えばこのブログの検索バーとか)に使ってみたいところです。