如何解决尽管文档存在,但 ElasticSearch 术语查询返回 0 个命中
我有一个 ES 域,当我使用文档的 emailId
字段查询时,我没有得到任何命中。但是,此字段和值存在于文档中。对于同一个文档,通过 employeeId
查询是有效的。
下面是我的索引映射的样子。
{
"properties": {
"employeeId": {
"type": "text","fields": {
"keyword": {
"ignore_above": 256,"type": "keyword"
}
}
},"emailId": {
"type": "text","type": "keyword"
}
}
}
}
}
以下是我运行搜索的方式。
public SearchResponse searchForExactDocument(final String indexName,final Map<String,Object> queryMap)
throws IOException {
BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery();
queryMap.forEach((name,value) -> {
queryBuilder.must(QueryBuilders.termQuery(name,value));
});
return this.executeSearch(indexName,queryBuilder);
}
private SearchResponse executeSearch(final String indexName,final QueryBuilder queryBuilder) throws IOException {
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(queryBuilder);
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices(indexName);
searchRequest.source(searchSourceBuilder);
return restHighLevelClient.search(searchRequest,RequestOptions.DEFAULT);
}
我运行了 SearcRequest.source().toString(),下面是我得到的搜索的源字符串。
{
"query": {
"bool": {
"must": [
{
"term": {
"emailId": {
"value": "21june6lambdatest7@gmail.com","boost": 1.0
}
}
}
],"adjust_pure_negative": true,"boost": 1.0
}
}
}
下面是应该返回的文档,但没有得到任何点击。
index{
[
person
][
_doc
][
null
],source[
{
"firstName": "MyEmployee","lastName": "June6Test7","emailId": "21june6lambdatest7@gmail.com","employeeId": "13908528"
}
]
}
我发现使用 employeeId
进行查询可以正常工作但 emailId
无法正常工作非常奇怪。任何帮助将不胜感激。
更新: 以下是我的索引创建方法。
public CreateIndexResponse createIndex(final CreateIndexInput createIndexInput) throws IOException {
CreateIndexRequest createIndexRequest = new CreateIndexRequest(createIndexInput.indexName());
Settings.Builder settingsBuilder = Settings.builder();
settingsBuilder.put(NUMBER_OF_SHARDS_KEY,createIndexInput.numOfShards());
settingsBuilder.put(NUMBER_OF_REPLICAS,createIndexInput.numOfReplicas());
settingsBuilder.put("analysis.analyzer.custom_uax_url_email.tokenizer","uax_url_email");
createIndexInput.mapping().ifPresent(mapping ->
createIndexRequest.mapping(mapping,XContentType.JSON));
createIndexRequest.settings(settingsBuilder.build());
return restHighLevelClient.indices().create(createIndexRequest,RequestOptions.DEFAULT);
}
解决方法
术语查询返回在提供的字段中包含确切术语的文档。您需要将 .keyword 添加到 emailId 字段。这使用关键字分析器而不是标准分析器(注意 emailId 字段后面的“.keyword”)。
如果未指定分析器,默认情况下 text
类型字段使用 standard analyzer。这会将“21june6lambdatest7@gmail.com”分解为以下标记
{
"tokens": [
{
"token": "21june6lambdatest7","start_offset": 0,"end_offset": 18,"type": "<ALPHANUM>","position": 0
},{
"token": "gmail.com","start_offset": 19,"end_offset": 28,"position": 1
}
]
}
您需要将查询修改为
{
"query": {
"bool": {
"must": [
{
"term": {
"emailId.keyword": { // note this
"value": "21june6lambdatest7@gmail.com","boost": 1.0
}
}
}
],"adjust_pure_negative": true,"boost": 1.0
}
}
}
更新 1: 根据下面的评论,将您的索引映射和设置修改为
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer"
}
},"tokenizer": {
"my_tokenizer": {
"type": "uax_url_email"
}
}
}
},"mappings": {
"properties": {
"emailId": {
"type": "text","analyzer":"my_analyzer"
}
}
}
}
搜索查询:
{
"query": {
"bool": {
"must": [
{
"match": {
"emailId": "21june6lambdatest7@gmail.com"
}
}
],"boost": 1.0
}
}
}
搜索结果:
"hits": [
{
"_index": "67823510","_type": "_doc","_id": "1","_score": 0.6931471,"_source": {
"emailId": "21june6lambdatest7@gmail.com"
}
}
]
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。