1. 首页
  2. 人工智能

ElasticSearch-7.3.0 进阶语法

ElasticSearch-7.3.0 进阶语法

elasticsearch官方文档

字段类型
# Text:被分析索引的字符串类型 # Keyword:不能被分析只能被精确匹配的字符串类型 # Date:日期类型,可以配置 format 一起使用({"type": "date", "format": "yyyy-MM-dd"}) # 数字类型:long,integer,short,double 等 # boolean 类型:true,false # Array:数组类型 ["one", "two"] # Object:json 嵌套({"property1": "value1", "property2": "value2"}) # Ip类型:127.0.0.1 # Geo_point:地理位置 	地址的定义: 	{ 		"mappings": { 			"_doc": { 				"properties": { 					"location": { 						"type": "geo_point" 					} 				} 			} 		} 	} 	建立索引的方式: 	"location": { 		"lat": 41.12, 		"lon": -71.34 	} 
高级查询语法
analyze分析过程
# 使用 analyze api 查看分词状态 GET /movie/_analyze {   "field": "name",   "text": "Eating an apple a day & keeps the doctor awawy" } 

ElasticSearch-7.3.0 进阶语法

# 使用结构化的方式重新创建索引(指定分词器) PUT /movie {   "settings": {     "number_of_shards": 1,     "number_of_replicas": 1   },   "mappings": {     "properties": {       "name": {         "type": "text",         "analyzer": "english"       }     }   } } 
Tmdb实例
数据下载

在网上直接搜索kiggle tmdb即可下载相对应的数据文件

索引建立
# 建立 movie 索引 PUT /movie {   "settings": {     "number_of_shards": 1,     "number_of_replicas": 1   },   "mappings": {     "properties": {       "title": {         "type": "text",         "analyzer": "english"       },       "tagline": {         "type": "text",         "analyzer": "english"       },       "release_date": {         "type": "date",         "format": "8yyyy/MM/dd||yyyy/M/dd||yyyy/MM/d||yyyy/M/d"       },       "popularity": {         "type": "double"       },       "overview": {         "type": "text",         "analyzer": "english"       },       "cast": {         "type": "object",         "properties": {           "character": {             "type": "text",             "analyzer": "standard"           },           "name": {             "type": "text",             "analyzer": "standard"           }         }       }     }   } } 
matchterm
# match 查询 GET /movie/_search {   "query": {     "match": {       "title": "steve zissou"     }   } }  # term 查询 GET /movie/_search {   "query": {     "term": {       "title": {         "value": "steve zissou"       }     }   } }  # match 查询会根据字段所指定的分词器对查询字段进行分词,而 term 并不会对查询字段进行分词,也就是说对于上面两个示例,title 指定的是 english 分词器,所以 match 查询中的 steve zissou 会被分词器解析成 steve 和 zissou 两个关键词,所以只要 title 中含有 steve 和 zissou 中任意一个关键词的都可以被命中,而 term 查询中的 steve zissou 不会被分词器解析,也就是说只有 title 中经过分词器解析后的词包含 steve zissou 时才会被命中。 
分词后的andor
# 分词后的 or 的逻辑 GET /movie/_search {   "query": {     "match": {       "title": "basketball with cartoom aliens"     }   } }  # 分词后的 and 的逻辑 GET /movie/_search {   "query": {     "match": {       "title": {         "query": "basketball with cartoom aliens",         "operator": "and"       }     }   } } 
最小词匹配项
# 最小词匹配项 GET /movie/_search {   "query": {     "match": {       "title": {         "query": "basketball love aliens",         "operator": "or",         "minimum_should_match": 2       }     }   } } 
短语查询
# 短语查询 GET /movie/_search {   "query": {     "match_phrase": {       "title": "steve zissou"     }   } } 
score打分
# 查看 score GET /movie/_search {   "explain": true,   "query": {     "match": {       "title": "steve"     }   } } ====================================================== "details" : [  { 	# 2.2 * 7.1592917 * 0.47008154 = 7.403992     "value" : 7.403992,     "description" : "score(freq=1.0), product of:",     "details" : [       {         "value" : 2.2,         # 可以手动指定这个 boost 放大系数,如果不指定,那么 es 将使用默认值为 2.2 的放大系数         "description" : "boost",         "details" : [ ]       },       {         "value" : 7.1592917,         # 逆文档频率:随着 n 的增加,整个 idf 是减少的         "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",         "details" : [           {             "value" : 3,             # 一共命中了 3 篇文档             "description" : "n, number of documents containing term",             "details" : [ ]           },           {             "value" : 4500,             # 文档的总个数为 4500             "description" : "N, total number of documents with field",             "details" : [ ]           }         ]       },       {         "value" : 0.47008154,         "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",         "details" : [           {             "value" : 1.0,             # 搜索关键词在文档字段中出现的次数             "description" : "freq, occurrences of term within document",             "details" : [ ]           },           {             "value" : 1.2,             "description" : "k1, term saturation parameter",             "details" : [ ]           },           {             "value" : 0.75,             "description" : "b, length normalization parameter",             "details" : [ ]           },           {             "value" : 2.0,             # 文档字段的长度             "description" : "dl, length of field",             "details" : [ ]           },           {             "value" : 2.1757777,             "description" : "avgdl, average length of field",             "details" : [ ]           }         ]       }     ]   } ] 
多字段查询
# 多字段查询:在多字段查询时,会对两个字段都进行打分,最后的打分结果取的是最大的那个分值 GET /movie/_search {   "query": {     "multi_match": {       "query": "basketball with cartoom aliens",       "fields": ["title", "overview"]     }   } }  # 优化多字段查询:让 title 字段占比加大 GET /movie/_search {   "query": {     "multi_match": {       "query": "basketball with cartoom aliens",       "fields": ["title^10", "overview"]     }   } }  # 优化多字段查询 GET /movie/_search {   "explain": true,    "query": {     "multi_match": {       "query": "basketball with cartoom aliens",       "fields": ["title^10", "overview"],       "tie_breaker": 0.3     }   } }  # bool 查询 # must:必须都为 true # must not:必须都是 false # should:其中只要有一个为 true,即可 # 为 true 的越多则得分越高 GET /movie/_search {   "explain": true,    "query": {     "bool": {       "should": [         {           "match": {             "title": "basketball with cartoom aliens"           }         },         {           "match": {             "overview": "basketball with cartoom aliens"           }         }       ]     }   } }  # 不同的 multi_query 其实是有不同的 type,type 不同则打分方式不同 # best_fields:默认的得分方式,取得最高的分数作为对应文档的得分,“最匹配模式” -> dis_max GET /movie/_search {   "query": {     "multi_match": {       "query": "basketball with cartoom aliens",       "fields": ["title", "overview"],       "type": "best_fields"     }   } }  # dis_max GET /movie/_search {   "explain": true,    "query": {     "dis_max": {       "queries": [         {           "match": {             "title": "basketball with cartoom aliens"           }         },         {           "match": {             "overview": "basketball with cartoom aliens"           }         }       ]     }   } }  # 查看打分规则:dis_max GET /movie/_validate/query?explain {   "query": {     "multi_match": {       "query": "basketball with cartoom aliens",       "fields": ["title^10", "overview"],       "type": "best_fields"     }   } }  # most_fields:考虑绝大多数(所有的),文档的字段得分相加获得我们想要的结果 GET /movie/_search {   "explain": true,    "query": {     "multi_match": {       "query": "basketball with cartoom aliens",       "fields": ["title", "overview"],       "type": "most_fields"     }   } }  # 权重的调整是针对于 boost进行调整 GET /movie/_validate/query?explain {   "query": {     "multi_match": {       "query": "basketball with cartoom aliens",       "fields": ["title^10", "overview^0.1"],       "type": "most_fields"     }   } }  # cross_fields:以分词为单位计算栏位的总分,适用于词导向的匹配模式 GET /movie/_search {   "explain": true,    "query": {     "multi_match": {       "query": "steve jobs",       "fields": ["title", "overview"],       "type": "cross_fields"     }   } }  GET /movie/_validate/query?explain {   "query": {     "multi_match": {       "query": "steve jobs",       "fields": ["title", "overview"],       "type": "cross_fields"     }   } }  # query string # 方便的利用 AND OR NOT GET /movie/_search {   "query": {     "query_string": {       "fields": ["title"],       "query": "steve AND jobs"     }   } } 
过滤与排序
# filter 过滤查询 # 单条件过滤 GET /movie/_search {   "query": {     "bool": {       "filter": {         "term": {           "title": "steve"         }       }     }   } }  # 多条件过滤 GET /movie/_search {   "query": {     "bool": {       "filter": [         {           "term": {             "title": "steve"           }         },         {           "term": {             "cast.name": "gaspard"           }         }       ]     }   } }  # 多条件过滤 GET /movie/_search {   "query": {     "bool": {       "filter": [         {           "term": {             "title": "steve"           }         },         {           "term": {             "cast.name": "gaspard"           }         },         {           "range": {             "release_date": {               "lte": "2015/01/01"             }           }         },         {           "range": {             "popularity": {               "gte": 25             }           }         }       ]     }   } }  # 多条件过滤并排序 GET /movie/_search {   "query": {     "bool": {       "filter": [         {           "term": {             "title": "steve"           }         },         {           "term": {             "cast.name": "gaspard"           }         },         {           "range": {             "release_date": {               "lte": "2015/01/01"             }           }         },         {           "range": {             "popularity": {               "gte": 25             }           }         }       ]     }   },   "sort": [     {       "popularity": {         "order": "desc"       }     }   ] }  # 带 match 打分的 filter,should 控制打分,filter 控制过滤 GET /movie/_search {   "query": {     "bool": {       "should": [         {           "match": {             "title": "life"           }         }       ],       "filter": [         {           "term": {             "title": "steve"           }         },         {           "term": {             "cast.name": "gaspard"           }         },         {           "range": {             "release_date": {               "lte": "2015/01/01"             }           }         },         {           "range": {             "popularity": {               "gte": 25             }           }         }       ]     }   } } 
查全率查准率
查全率:正确的结果有 n 个,查询出来正确的有 m 个,所以查全率就是 m / n 查准率:查出的 n 个文档有 m 个文档是正确的,所以查准率就是 m / n 两者不可兼得,但是可以调整顺序  通常可以追求高的查全率,因为查全率高必然会导致查准率降低,保证查准率中所查询的 m 个内容排在前面,这样既可以保证用户体验,还可以保证查全率。 
自定义score
# function-score GET /movie/_search {   "explain": true,    "query": {     "function_score": {       # 原始查询得到的 oldScore       "query": {         "multi_match": {           "query": "steve job",           "fields": [             "title",             "overview"           ],           "operator": "or",           "type": "most_fields"         }       },       "functions": [         {           "field_value_factor": {             # 对应要调整处理的字段             "field": "popularity",             "modifier": "log2p",             "factor": 10           }         }       ]     }   } }  # function-score GET /movie/_search {   "explain": true,    "query": {     "function_score": {       # 原始查询得到的 oldScore       "query": {         "multi_match": {           "query": "steve job",           "fields": [             "title",             "overview"           ],           "operator": "or",           "type": "most_fields"         }       },       "functions": [         {           "field_value_factor": {             # 对应要调整处理的字段             "field": "popularity",             "modifier": "log2p",             "factor": 10           }         },         {           "field_value_factor": {             "field": "popularity",             "modifier": "log2p",             "factor": 5           }         }       ],       # 不同的 field value 之间的得分相加       "score_mode": "sum",       # 最后再与 old value 相加       "boost_mode": "sum"     }   } } 

  • 点赞
  • 收藏
  • 分享

    • 文章举报

ElasticSearch-7.3.0 进阶语法
xiao儿
发布了90 篇原创文章 · 获赞 9 · 访问量 9269

私信 关注

原文始发于:ElasticSearch-7.3.0 进阶语法

主题测试文章,只做测试使用。发布者:熱鬧獨處,转转请注明出处:http://www.cxybcw.com/147239.html

联系我们

13687733322

在线咨询:点击这里给我发消息

邮件:1877088071@qq.com

工作时间:周一至周五,9:30-18:30,节假日休息

QR code