如何解决布尔相似度-是否有删除重复项的方法
给出以下索引
PUT /test_index
{
"mappings": {
"properties": {
"field1": {
"type": "text","analyzer": "whitespace","similarity": "boolean"
},"field2": {
"type": "text","similarity": "boolean"
}
}
}
}
以及以下数据
POST /test_index/_bulk?refresh=true
{ "index" : {} }
{ "field1": "foo","field2": "bar"}
{ "index" : {} }
{ "field1": "foo1 foo2","field2": "bar1 bar2"}
{ "index" : {} }
{ "field1": "foo1 foo2 foo3","field2": "bar1 bar2 bar3"}
用于给定的布尔相似性查询
POST /test_index/_search
{
"size": 10,"min_score": 0.4,"query": {
"function_score": {
"query": {
"bool": {
"should": [
{
"fuzzy":{
"field1":{
"value":"foo","fuzziness":"AUTO","boost": 1
}
}
},{
"fuzzy":{
"field2":{
"value":"bar","boost": 1
}
}
}
]
}
}
}
}
}
我总是收到[“ foo1 foo2 foo3”,“ bar1 bar2 bar3”],尽管事实上索引中有一个精确的结果(第一个):
{
"took": 114,"timed_out": false,"_shards": {
"total": 1,"successful": 1,"skipped": 0,"Failed": 0
},"hits": {
"total": {
"value": 3,"relation": "eq"
},"max_score": 3.9999998,"hits": [
{
"_index": "test_index","_type": "_doc","_id": "bXw8eXUBCTtfNv84bNPr","_score": 3.9999998,"_source": {
"field1": "foo1 foo2 foo3","field2": "bar1 bar2 bar3"
}
},{
"_index": "test_index","_id": "bHw8eXUBCTtfNv84bNPr","_score": 2.6666665,"_source": {
"field1": "foo1 foo2","field2": "bar1 bar2"
}
},"_id": "a3w8eXUBCTtfNv84bNPr","_score": 2.0,"_source": {
"field1": "foo","field2": "bar"
}
}
]
}
}
我知道Boolean可以匹配尽可能多的结果的事实,我知道我可以在这里进行记录,但这不是一个选择,因为我不知道要提取多少前N个结果。
这里还有其他选择吗?也许可以根据布尔相似性创建我自己的相似性插件,以删除重复项并保留最匹配的标记,但是我不知道从哪里开始,我只看到脚本和重新评分的示例。
解决方法
更新:-根据我先前答案的注释部分所提供的清晰度,来更新答案。
以下查询返回预期结果
ReactPlayer
和搜索结果
{
"min_score": 0.4,"size":10,"query": {
"function_score": {
"query": {
"bool": {
"should": [
{
"fuzzy": {
"field1": {
"value": "foo","fuzziness": "AUTO","boost": 0.5
}
}
},{
"term": { --> used for boosting the exact terms
"field1": {
"value": "foo","boost": 1.5 --> further boosting the exact match.
}
}
}
]
}
}
}
}
}
另一个不带确切术语的查询也会返回预期结果
"hits": [
{
"_index": "test_index","_type": "_doc","_id": "zdMEvHUBlo4-1mHbtvNH","_score": 2.0,"_source": {
"field1": "foo","field2": "bar"
}
},{
"_index": "test_index","_id": "z9MEvHUBlo4-1mHbtvNH","_score": 0.99999994,"_source": {
"field1": "foo1 foo2 foo3","field2": "bar1 bar2 bar3"
}
},"_id": "ztMEvHUBlo4-1mHbtvNH","_score": 0.6666666,"_source": {
"field1": "foo1 foo2","field2": "bar1 bar2"
}
}
]
和搜索结果
{
"min_score": 0.4,{
"term": {
"field1": {
"value": "foo" --> notice there is no boost
}
}
}
]
}
}
}
}
}
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。