如何解决ES建议,搜索索引项中的所有单词不仅是第一个单词
'settings' => array(
'analysis' => array(
'analyzer' => array(
'stop_analyzer' => array(
'type' => 'custom','tokenizer' => 'standard','filter' => array(
'lowercase','english_stop'
)
)
),"filter" => array(
"english_stop" => array(
"type" => "stop","stopwords" => "_english_"
)
)
)
),'mappings' => array(
'properties' => array(
'texts' => array(
'type' => 'completion',"analyzer" => "stop_analyzer","search_analyzer" => "stop_analyzer",'preserve_position_increments' => false
),),)
当我开始建议搜索时(带或不带停用词),此功能非常理想。但是,例如,当我在索引This is the text
中有此内容,并且搜索text
时,我不会得到任何结果,那么执行此操作的正确方法是什么?我宁愿不使用N-gram。
'suggest' => array(
'suggestion' => array(
'prefix'=> 'text','completion' => array(
'field' => 'texts'
)
)
)
解决方法
根据用户给出的注释,添加另一个答案,以便使用n-gram搜索所有单词。以前的方法效果很好,但是使用正则表达式非常昂贵。
添加带有索引映射,索引数据,搜索查询和搜索结果的工作示例
索引映射:
{
"settings": {
"analysis": {
"filter": {
"my_custom_stop_words_filter": {
"type": "stop","ignore_case": true,"stopwords": [
"and","is","the"
]
},"ngram_filter": {
"type": "ngram","min_gram": 4,"max_gram": 20
}
},"analyzer": {
"ngram_analyzer": {
"type": "custom","tokenizer": "standard","filter": [
"lowercase","ngram_filter","my_custom_stop_words_filter"
]
}
}
},"max_ngram_diff": 50
},"mappings": {
"properties": {
"title": {
"type": "text","analyzer": "ngram_analyzer","search_analyzer": "standard"
}
}
}
}
分析API
POST/_analyze
{
"analyzer" : "ngram_analyzer","text" : "This is the text"
}
会生成以下令牌:
{
"tokens": [
{
"token": "this","start_offset": 0,"end_offset": 4,"type": "<ALPHANUM>","position": 0
},{
"token": "text","start_offset": 12,"end_offset": 16,"position": 3
}
]
}
索引数据:
{
"title": [
"This is the text"
]
}
搜索查询:
{
"query": {
"match": {
"title": "text"
}
}
}
搜索结果:
"hits": [
{
"_index": "stof_29753971","_type": "_doc","_id": "1","_score": 0.41978103,"_source": {
"title": [
"This is the text"
]
}
}
]
,
完成建议的最佳方式,可以与的中间匹配 字段是n-gram过滤器。
但是,由于您不想使用n-gram,因此可以尝试以下方法:
您可以使用多个建议,其中一个建议基于前缀,对于字段中间的匹配,可以使用正则表达式。
添加具有索引映射,数据,搜索查询和搜索结果的工作示例
索引映射:
{
"settings": {
"analysis": {
"filter": {
"my_custom_stop_words_filter": {
"type": "stop","stopwords": [ "and","the" ]
}
},"analyzer": {
"autocomplete": {
"type": "custom","tokenizer": "whitespace","my_custom_stop_words_filter"
]
}
}
}
},"mappings": {
"properties": {
"title": {
"type": "keyword"
},"suggest": {
"type": "completion","analyzer": "autocomplete","search_analyzer": "standard"
}
}
}
}
索引数据:
{
"suggest": [
{
"input": "This is the text"
}
]
}
{
"suggest": [
{
"input": "Software Manager"
}
]
}
搜索查询:
{
"suggest": {
"suggest-exact": {
"prefix": "text","completion": {
"field": "suggest","skip_duplicates": true
}
},"suggest-regex": {
"regex": ".*text.*","skip_duplicates": true
}
}
}
}
搜索结果:
"suggest": {
"suggest-exact": [
{
"text": "text","offset": 0,"length": 4,"options": []
}
],"suggest-regex": [
{
"text": ".*text.*","length": 8,"options": [
{
"text": "This is the text","_index": "test","_score": 1.0,"_source": {
"suggest": [
{
"input": "This is the text"
}
]
}
}
]
}
]
}
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。