微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

使用弹性搜索从文本中提取关键字多词并返回搜索词的偏移量

如何解决使用弹性搜索从文本中提取关键字多词并返回搜索词的偏移量

我想从查询提取很多关键字,并告诉关键字在该文本中的位置(偏移量) 所以这是我的进步,我创建了两个自定义分析器关键字和带状疱疹:

{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer_keyword": {
          "type": "custom","tokenizer": "keyword","filter": [
            "asciifolding","lowercase"
          ]
        },"my_analyzer_shingle": {
          "type": "custom","tokenizer": "standard","lowercase","shingle"
          ]
        }
      }
    }
  },"mappings": {
    "your_type": {
      "properties": {
        "keyword": {
          "type": "string","index_analyzer": "my_analyzer_keyword","search_analyzer": "my_analyzer_shingle"
        }
      }
    }
  }

这里是我说的关键词:

{
  "hits": {
    "total": 2000,"hits": [
      {
        "id": 1,"keyword": "python programming"
      },{
        "id": 2,"keyword": "facebook"
      },{
        "id": 3,"keyword": "Microsoft"
      },{
        "id": 4,"keyword": "NLTK"
      },{
        "id": 5,"keyword": "Natural language processing"
      }
    ]
  }
}

我做了一个这样的查询

{
  "query": {
    "match": {
      "keyword": "I post a lot of things on Facebook and quora"
    }
  }
}

所以我得到了上面的代码

{
  "took": 6,"timed_out": false,"_shards": {
    "total": 5,"successful": 5,"Failed": 0
  },"hits": {
    "total": 4,"max_score": 0.009332742,"hits": [
      {
        "_index": "test","_type": "your_type","_id": "2","_score": 0.009332742,"_source": {
          "id": 2,"keyword": "facebook"
        }
      },{
        "_index": "test","_id": "4","_score": 0.009207102,"_source": {
          "id": 4,"keyword": "quora"
        }
      }
    ]
  }
}

但我不知道文本中的词是那些词的偏移量: 我想知道 quora 从索引 40 开始。但不要在标签或类似的东西之间突出显示它们。

我想提一下,我的帖子是基于这个帖子

Extract keywords (multi word) from text using elastic search

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。