一、elasticsearch简介
Elasticsearch 是一个分布式、RESTful 风格的搜索和数据分析引擎,能够解决不断涌现出的各种用例。 作为 Elastic Stack 的核心,它集中存储您的数据,帮助您发现意料之中以及意料之外的情况。
——摘自官网
二、Docker安装es
建议和kibana
搭配使用。
1、安装es
- 启动命令
docker run --name elasticsearch -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" -e ES_JAVA_OPTS="-Xmx128m -Xms64m" -v /home/yanghuanxi/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml -v /home/yanghuanxi/elasticsearch/data:/usr/share/elasticsearch/data -v /home/yanghuanxi/elasticsearch/plugins:/usr/share/elasticsearch/plugins -d elasticsearch:7.7.0
2、安装kibana
- 启动命令
docker run -d -p 5601:5601 --link elasticsearch -e "ELASTICSEARCH_URL=http://虚拟机地址:9200" kibana:7.7.0
可能遇到的错误:
解决 浏览器提示Kibana server is not ready yet,查看日志,是显示无法链接ES
kibana
的启动需要指定es的地址,有两种方式,上面是第一种,没试过。
第二种,进入容器,修改配置文件,将里面的es地址修改了之后重启容器。
==⚠ 注意:需要给es和kibana挂载目录可读可写权限
chmod -R 777 elasticsearch/
️==
三、基本命令
1、_cat
GET /_cat/nodes: 查看所有节点
GET /_cat/health: 查看es健康状态
GET /_cat/master: 查看es的主节点
GET /_cat/indices: 查看所有索引。相当于show databases;
2、新增索引
贴一个官方的示例数据
PUT /customer/_doc/1
{
"name": "John Doe"
}
POST /customer/_doc/1
{
"name": "John Doe"
}
修改也用这个接口。PUT和POST的区别在于使用PUT新增索引必须带id,而POST可以不带id,会自动生成。
3、查询
GET /customer/_doc/1
result >>>
{
"_index" : "customer",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"_seq_no" : 26,
"_primary_term" : 4,
"found" : true,
"_source" : {
"name": "John Doe"
}
}
4、批量插入
语法:
Request
POST /_bulk
POST /<index>/_bulk
POST _bulk
{ "index" : { "_index" : "test", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_id" : "2" } }
{ "create" : { "_index" : "test", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }
5、更新
>>> 数据没发生变化的话version就不会增加
POST fzkj/_update/1
{
"doc":{
"name":"test2",
"address":"北京2"
}
}
>>> 数据没发生变化version也会增加
POST fzkj/_doc/1
{
"name":"test3",
"address":"北京3"
}
6、删除索引
DELETE fzkj/_doc/1
DELETE fzkj
四、查询进阶
1、match_all
GET /bank/_search
{
"query": { "match_all": {} },
"sort": [
{ "account_number": "asc" }
]
}
result >>>
{
"took" : 63,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value": 1000,
"relation": "eq"
},
"max_score" : null,
"hits" : [ {
"_index" : "bank",
"_type" : "_doc",
"_id" : "0",
"sort": [0],
"_score" : null,
"_source" : {"account_number":0,"balance":16623,"firstname":"Bradshaw","lastname":"Mckenzie","age":29,"gender":"F","address":"244 Columbus Place","employer":"Euron","email":"bradshawmckenzie@euron.com","city":"Hobucken","state":"CO"}
}, {
"_index" : "bank",
"_type" : "_doc",
"_id" : "1",
"sort": [1],
"_score" : null,
"_source" : {"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}
}, ...
]
}
}
默认会返回命中的前10条记录。可以通过from
、size
调整。
GET /bank/_search
{
"query": { "match_all": {} },
"sort": [
{ "account_number": "asc" },
{ "xxx": "desc" }
],
"from": 10,
"size": 10,
"_source": ["name", "balance"] # 指定返回的字段
}
2、match全文检索
>>> 查询address值为milk lane的数据(会分词)
GET /bank/_search
{
"query": { "match": { "address": "mill lane" } }
}
3、match_phrase短语匹配
>>> 查询address值为milk lane的数据(不分词)
GET /bank/_search
{
"query": { "match_phrase": { "address": "mill lane" } }
}
要匹配短语除了使用match_phease之外,还可以使用key.keyword,,比如:
>>> 查询address值为milk lane的数据(不分词)
GET /bank/_search
{
"query": { "match": { "address.keyword": "mill lane" } }
}
两种方式的区别:
match_phrase表示做短语匹配,会将待匹配的值当作一个短语,只要查找的单词中包含这个短语就算;
而.keyword会做一个精确匹配,只有结果中完整包含.keyword的值才算。
4、multi_match多字段匹配
GET /bank/_search
{
"query": {
"multi_match": {
"query": "multi",
"fields": ["state", "address"]
}
}
}
查询state字段或者address字段包含multi的数据
5、must
>>> 查询age = 40 & state != ID的数据
GET /bank/_search
{
"query": {
"bool": {
"must": [
{ "match": { "age": "40" } }
],
"must_not": [
{ "match": { "state": "ID" } }
],
"should": [{
"match": {"lastname": "zhangsan"}
}]
}
}
}
6、filter结果过滤【不会共享相关性得分】
>>> 查询balance大于20000并且小于30000的数据
GET /bank/_search
{
"query": {
"bool": {
"must": { "match_all": {} },
"filter": {
"range": {
"balance": {
"gte": 20000,
"lte": 30000
}
}
}
}
}
}
7、term检索
term和match功能相同,都是用来做检索的。==区别在于term主要用来做精确字段的检索,match主要用来做全文检索==。比如:使用term查询年龄、性别等;使用match查询名字、描述等文本。
GET /bank/_search
{
"query": { "term": { "age": "90" } }
}
8、aggregations聚合分析
- 检索address中包含mill的所有人的年龄分布及平均年龄
GET bank/_search
{
"query": {
"match": {
"address": "mill"
}
},
"aggs": {
"ageAgg": {
"terms": {
"field": "age",
"size": 10
}
},
"ageAvg": {
"avg": {
"field": "age"
}
}
}
}
result >>>
"aggregations" : {
"ageAgg" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 38,
"doc_count" : 2
},
{
"key" : 28,
"doc_count" : 1
},
{
"key" : 32,
"doc_count" : 1
}
]
},
"ageAvg" : {
"value" : 34.0
}
}
- 按照年龄聚合,并且请求各年龄段的平均薪资
GET bank/_search
{
"query": {
"match": {
"address": "mill"
}
},
"aggs": {
"ageAgg": {
"terms": {
"field": "age",
"size": 10
},
"aggs": {
"balanceAvg": {
"avg": {
"field": "balance"
}
}
}
}
},
"size": 0
}
result >>>
"aggregations" : {
"ageAgg" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 38,
"doc_count" : 2,
"balanceAvg" : {
"value" : 27806.5
}
},
{
"key" : 28,
"doc_count" : 1,
"balanceAvg" : {
"value" : 19648.0
}
},
{
"key" : 32,
"doc_count" : 1,
"balanceAvg" : {
"value" : 25571.0
}
}
]
}
}
- 查出所有的年龄分布,并且这些年龄段中M的平均薪资和F的平均薪资以及这个年龄段的总体平均薪资
GET bank/_search
{
"query": {
"match_all": {
}
},
"aggs": {
"ageAgg": {
"terms": {
"field": "age",
"size": 100
},
"aggs": {
"genderAvg": {
"terms": {
"field": "gender.keyword"
},
"aggs": {
"balanceAvg": {
"avg": {
"field": "balance"
}
}
}
},
"balanceAvg":{
"avg": {
"field": "balance"
}
}
}
}
},
"size": 0
}
五、mapping映射
mapping映射是用来定义一个索引下包含的属性(field)是如何存储索引的。比如,使用mapping来定义:
- 哪些字符串属性应该被看做全文本属性
- 哪些属性包含数字、日期和地理坐标
- 文档中属性是否被索引
- 日期的格式
- 自定义映射规则来执行动态添加属性
==⚠️:es在6.0.0的版本中移了类型的概念,所有的数据都存储在索引下。==
1、创建映射
PUT /my-index
{
"mappings": {
"properties": {
"age": { "type": "integer" },
"email": { "type": "keyword" },
"name": { "type": "text" }
}
}
}
2、添加新的字段映射
PUT /my-index/_mapping
{
"properties": {
"employee-id": {
"type": "integer"
"index": false # 不被索引
}
}
}
==⚠️:不能更新映射,可以迁移数据==
3、迁移数据【reindex】
先创建出正确的映射,然后使用如下方式进行数据迁移:
POST _reindex
{
"source": {
"index": "old"
},
"dest": {
"index": "new"
}
}
六、安装ik分词器
新版本es好像内置了这个插件了。
说下思路:
下载ik分词器到es下的plugins文件夹中,修改ik的IKAnalyzer.cfg.xml
文件,将里面的词库地址修改下。