如何解决如何使较短较近的令牌匹配更相关? edge_ngram
我使用的用于自动完成功能的edge_ngram标记生成器得到了奇怪的结果。我试图弄清楚如何使我的结果更相关。我从elasticsearch文档中复制了example。
- “未加工的苹果,没有皮肤”
- “苹果,生的,金黄的,有皮的”
- “辣椒,苹果酱”
- “婴儿食品,水果,苹果酱,初中”
如果我搜索apple
,则“ APPLEBEE'S,chili”的得分要高于“无皮的苹果”
如果我搜索apples
,则“婴儿食品,水果,苹果酱,初中”的得分要高于“苹果,生的,金黄的,有皮的苹果”
在这两种情况下,我都希望对更相关/更短的匹配具有更高的分数(即,当我搜索apple
或apples
时,结果中包含单词{{ 1}}的得分应高于apples
或APPLEBEE'S
。
我的设置是:
applesauce
查询:
{
"settings": {
"analysis": {
"analyzer": {
"autocomplete": {
"tokenizer": "autocomplete","filter": [
"lowercase","asciifolding"
]
},"autocomplete_search": {
"tokenizer": "lowercase"
}
},"tokenizer": {
"autocomplete": {
"type": "edge_ngram","min_gram": 2,"max_gram": 20,"token_chars": [
"letter"
]
}
}
}
},"mappings": {
"properties": {
"description": {
"type": "text","analyzer": "autocomplete","search_analyzer": "autocomplete_search"
}
}
}
}
如何使相关性更高的得分更高?
解决方法
由于新的BM25算法(用于评分)中称为(dl)的匹配字段的长度而发生此问题,您可以轻松地在查询中使用explain param来详细了解它
http:// {{hostname}}:{{port}} // _ search?explain = true
您的APPLEBEE'S,chili
的长度最短,它会获得更高的分数,因此这是该文档的tf分数
{
"value": 0.5344296,"description": "tf,computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:","details": [
{
"value": 1.0,"description": "freq,occurrences of term within document","details": []
},{
"value": 1.2,"description": "k1,term saturation parameter",{
"value": 0.75,"description": "b,length normalization parameter",{
"value": 11.0,"description": "dl,length of field",---> note this
"details": []
},{
"value": 17.333334,"description": "avgdl,average length of field","details": []
}
]
}
解决方案
您需要创建另一个使用english
分析器的字段,如multi-fields示例中所示,以下是完整示例
索引示例
{
"settings": {
"analysis": {
"analyzer": {
"autocomplete": {
"tokenizer": "autocomplete","filter": [
"lowercase","asciifolding"
]
},"autocomplete_search": {
"tokenizer": "lowercase"
}
},"tokenizer": {
"autocomplete": {
"type": "edge_ngram","min_gram": 2,"max_gram": 20,"token_chars": [
"letter"
]
}
}
}
},"mappings": {
"properties": {
"name": {
"type": "text","analyzer": "autocomplete","search_analyzer": "autocomplete_search","fields": {
"english": {
"type": "text","analyzer": "english"
}
}
}
}
}
}
}
并索引示例文档
{
"name" : "Apples,raw,without skin"
}
{
"name" : "APPLEBEE'S,chili"
}
{
"name" : "Babyfood,fruit,applesauce,junior"
}
{
"name" : "Apples,golden delicious,with skin"
}
并搜索查询
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "apple","fields": [
"name.english","name"
]
}
}
]
}
}
}
搜索结果中,包含apple
"hits": [
{
"_index": "edgelow","_type": "_doc","_id": "1","_score": 0.6747451,"_source": {
"name": "Apples,without skin"
}
},{
"_index": "edgelow","_id": "4","_score": 0.60996956,with skin"
}
},"_id": "2","_score": 0.12822598,"_source": {
"name": "APPLEBEE'S,chili"
}
},"_id": "3","_score": 0.09446116,"_source": {
"name": "Babyfood,junior"
}
}
]
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。