微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

Elasticsearch指标聚合:数组中的元素数

如何解决Elasticsearch指标聚合:数组中的元素数

不错的尝试,您快到了!这是我想出的。根据您的映射建议,我正在使用的映射如下:

curl -XPUT localhost:9200/test/_mapping/test -d '{
  "test": {
    "properties": {
      "keyword": {
        "type": "string",
        "index": "not_analyzed"
      },
      "items": {
        "type": "nested",
        "properties": {
          "name": {
            "type": "string"
          },
          "item_property_1": {
            "type": "string",
            "index": "not_analyzed"
          }
        }
      }
    }
  }
}'

注意:您需要擦除数据并重新编制索引,因为您无法将字段类型从不是更改nestednested

然后,我使用您共享的批量查询创建了一些数据:

curl -XPOST localhost:9200/test/test/_bulk -d '
{ "index": {}}
{  "keyword": "some keyword",  "items": [    {      "name":"my first item",      "item_property_1":"A"    },    {      "name":"my second item",      "item_property_1":"B"    },    {      "name":"my third item",      "item_property_1":"A"     }  ]}
{ "index": {}}
{  "keyword": "different keyword",  "items": [    {      "name":"cool item",      "item_property_1":"A"    },    {      "name":"awesome item",      "item_property_1":"C"    }  ]}
'

最后,这是可用于获取期望结果的聚合查询。我们首先keyword使用terms聚合来进行存储,然后针对每个关键字通过嵌套item_property_1字段进行存储。由于items现在是一个nested类型的,关键是用nested聚合items,然后一个terms子聚集的item_property_1领域。

{
  "size": 0,
  "aggregations": {
    "by_keyword": {
      "terms": {
        "field": "keyword"
      },
      "aggs": {
        "prop_1_count": {
          "nested": {
            "path": "items"
          },
          "aggs": {
            "prop_1": {
              "terms": {
                "field": "items.item_property_1"
              }
            }
          }
        }
      }
    }
  }
}

在您的数据集上运行该查询将产生以下结果:

{
  ...
  "aggregations" : {
    "by_keyword" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [ {
        "key" : "different keyword",       <---- keyword 1
        "doc_count" : 1,
        "prop_1_count" : {
          "doc_count" : 2,
          "prop_1" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [ {                <---- buckets for item_property_1
              "key" : "A",
              "doc_count" : 1
            }, {
              "key" : "C",
              "doc_count" : 1
            } ]
          }
        }
      }, {
        "key" : "some keyword",            <---- keyword 2
        "doc_count" : 1,
        "prop_1_count" : {
          "doc_count" : 3,
          "prop_1" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [ {                <---- buckets for item_property_1
              "key" : "A",
              "doc_count" : 2
            }, {
              "key" : "B",
              "doc_count" : 1
            } ]
          }
        }
      } ]
    }
  }
}

解决方法

我想做一个相当复杂的查询/聚合。我看不到该怎么做,因为我刚刚开始使用ES。我的文档看起来像这样:

{
  "keyword": "some keyword","items": [
    {
      "name":"my first item","item_property_1":"A",( other properties here )
    },{
      "name":"my second item","item_property_1":"B",{
      "name":"my third item",( other properties here )
    }
  ]
  ( other properties... )
},{
  "keyword": "different keyword","items": [
    {
      "name":"cool item",{
      "name":"awesome item","item_property_1":"C",]
  ( other properties... )
},( other documents... )

现在,我想为每个关键字计算property_1可以具有的几个可能值中有多少个。也就是说,我需要一个具有以下响应的存储桶聚合:

{
  "keyword": "some keyword","item_property_1_aggretation": [
    {
      "key":"A","count": 2,},{
      "key":"B","count": 1,}
  ]
},{
      "key":"C",( other keywords... )

如果需要映射,您还可以指定哪个吗?我没有任何非默认映射,我只是将所有内容都转储在那里。

编辑:通过在此处发布上一个示例的批量PUT为您节省了麻烦

PUT /test/test/_bulk
{ "index": {}}
{  "keyword": "some keyword","items": [    {      "name":"my first item","item_property_1":"A"    },{      "name":"my second item","item_property_1":"B"    },{      "name":"my third item","item_property_1":"A"     }  ]}
{ "index": {}}
{  "keyword": "different keyword","items": [    {      "name":"cool item",{      "name":"awesome item","item_property_1":"C"    }  ]}

编辑2:

我只是试过这个:

POST /test/test/_search
{
    "size":2,"aggregations": {
        "property_1_count": {
            "terms":{
                "field":"item_property_1"
            }
        }
    }
}

并得到了这个:

"aggregations": {
   "property_1_count": {
      "doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [
         {
            "key": "a","doc_count": 2
         },{
            "key": "b","doc_count": 1
         },{
            "key": "c","doc_count": 1
         }
      ]
   }
}

关闭但没有雪茄。您可以看到发生了什么,item_property_1无论keyword它们属于哪个,它都在进行存储。我确定该解决方案涉及正确添加一些映射,但是我无法全力以赴。有什么建议吗?

EDIT3:基于此:https ://www.elastic.co/guide/zh-
cn/elasticsearch/reference/current/mapping-nested-type.html
我想尝试将一个nested类型添加到property items。为此,我尝试:

PUT /test/_mapping/test
{
    "test":{
        "properties": {
            "items": {
                "type": "nested","properties": {
                    "item_property_1":{"type":"string"}
                }
            }
        }
    }
}

但是,这将返回错误:

{
   "error": "MergeMappingException[Merge failed with failures {[object mapping [items] can't be changed from non-nested to nested]}]","status": 400
}

这可能与该URL上的警告有关:“将对象类型更改为嵌套类型需要重新索引。”

那么,我该怎么做呢?

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。