如何解决每天按指定时间范围进行弹性查询聚合 所以我提出以下建议
嗨,我需要编写一个特定的查询,该查询将在几天内按选定时间范围内的工作班次汇总数据。问题是我不想在 date_range 聚合中直接指定所有范围,只想为聚合的特定日期指定 from -> to 时间范围。 有没有可能以简单的方式做到这一点?
{
"_source": false,"size": 10000,"query": {
"bool": {
"must": [
{
"terms": {
"streamId": [
"ENRG_0054"
]
}
},{
"range": {
"timestamp": {
"gte": "2021-02-01T00:00:00Z","lte": "2021-02-10T01:00:00Z"
}
}
}
]
}
},"sort": [
{
"timestamp": {
"order": "asc"
}
},{
"_score": {
"order": "asc"
}
}
],"aggs": {
"streamId": {
"terms": {
"field": "streamId","size": 10000
},"aggs": {
"days": {
"date_histogram": {
"field": "timestamp","interval": "1d"
},"aggs": {
"shifts": {
"date_range": {
"field": "timestamp","format": "HH:mm","ranges": [
{
"key": "MORNING","from": "06:00","to": "14:00"
},{
"key": "AFTERNOON","from": "14:00","to": "22:00"
}
],"keyed": true
},"aggs": {
"MAX": {
"max": {
"field": "@floatMessage.value.value"
}
},"MIN": {
"min": {
"field": "@floatMessage.value.value"
}
},"DIFF": {
"bucket_script": {
"buckets_path": {
"min": "MIN","max": "MAX"
},"script": {
"source": "return (params.max-params.min)"
}
}
}
}
}
}
}
}
}
}
}
但在结果中我得到了空值,因为时间范围没有用日期指定。
"aggregations": {
"streamId": {
"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [
{
"key": "ENRG_0054","doc_count": 13343,"days": {
"buckets": [
{
"key_as_string": "2021-02-01T00:00:00.000Z","key": 1612137600000,"doc_count": 2763,"shifts": {
"buckets": {
"MORNING": {
"from": 2.16E7,"from_as_string": "06:00","to": 5.04E7,"to_as_string": "14:00","doc_count": 0,"MIN": {
"value": null
},"MAX": {
"value": null
}
},"AFTERNOON": {
"from": 5.04E7,"from_as_string": "14:00","to": 7.92E7,"to_as_string": "22:00","MAX": {
"value": null
}
}
}
}
},
示例文档:
{
"streamId": "ENRG_0054","created": "2021-02-01T00:19:42.905Z","extra": {},"location": null,"model": "floatMessage","id": "6017491eb112b21488f6c843","value": {
"unit": "°C","value": 18.94,"messageProcessed": "2021-02-01T00:19:41.595Z"
},"timestamp": "2021-02-01T00:19:39.161Z","tags": []
}
当我为整个查询生成所需时间戳范围的所有 date_ranges 时,结果正常,这是获得所需结果的唯一方法,还是有人可以建议如何更新查询以满足我的要求? 谢谢
解决方法
您在 data_range
聚合中没有看到任何存储桶的原因与 datetime
vs date
推理有关 -- 类似于 I discussed here a不久前。
简而言之,date_range
聚合在处理时间值 (HH:mm
) 而不是完整的日期时间值 ({{1} }}) 因为:
- 如果未提供
MM-dd-yyyy HH:mm
,则默认为 1970 - 如果未提供
year
,则默认为 Jan - 如果未提供
month
,则默认为 当月 1 日(如果未提供,则默认为 一月 ) - 等等。
你看,如果你只添加了年份组件:
day
Elasticsearch 会返回:
"date_range": {
"field": "timestamp","format": "HH:mm yyyy",<---
"ranges": [
{
"key": "MORNING","from": "06:00 2021",<---
"to": "14:00 2021" <---
}
],"keyed": true
}
添加 "MORNING" : {
"from" : 2.16E7,"from_as_string" : "06:00 1970",<--- ?
"to" : 5.04E7,"to_as_string" : "14:00 1970",<--- ?
...
}
将解决这个特定的时间点问题,但当然会引入只能在一个具体年份的一个月内进行聚合的问题。
所以我提出以下建议
- 在映射中再添加一个名为
month
的date
字段:
time
- 将此新字段添加到每个文档(或使用 ingest pipeline 或 scripted
_update_by_query
call):
{
"mappings": {
"properties": {
"streamId": {
"type": "keyword"
},...
"time": {
"type": "date",<---
"format": "HH:mm:ss.SSSz"
}
}
}
}
- 使用与上述相同的查询,但在
{ "streamId": "ENRG_0054",... "timestamp": "2021-02-01T00:19:39.161Z","time": "00:19:39.161Z",<--- "tags": [] }
字段上聚合
time
仅此而已!
附言在幕后,"days": {
"date_histogram": {
"field": "timestamp",<---
"interval": "1d"
},"aggs": {
"shifts": {
"date_range": {
"field": "time",<---
"format": "HH:mm","ranges": [
值将是 auto-assigned to 1970 但这很好,因为您只对 时间 值感兴趣。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。