如何解决在 Impala 中将聚合函数与重采样相结合
我在 Hadoop 中有一个表,其中我有不同传感器单元的数据,采样时间 ts
为 1 毫秒。我可以使用 Impala 中的以下查询,使用不同聚合函数的组合重新采样单个单元的数据(假设我想使用 LAST_VALUE()
作为聚合函数每 5 分钟重新采样一次数据):
SELECT disTINCT *
from ( select ts_resample,unit,last_value(Val1) over (partition by ts_resample order by ts rows between unbounded preceding and unbounded following) as Val1,last_value(Val2) over (partition by ts_resample order by ts rows between unbounded preceding and unbounded following) as Val2
from (
SELECT cast(cast(unix_timestamp(cast(ts/1000 as TIMESTAMP))/300 as bigint)*300 as TIMESTAMP) as ts_resample,ts as ts,unit as unit,Val1 as Val1,Val2 as Val2
FROM Sensor_Data.Table1 WHERE unit='Unit1') as t) as tt
如果我为单个单元运行此查询,那么我会得到正确的答案并且没有问题。
但是如果我想根据一些聚合函数为每个单元重新采样数据,例如LAST_VALUE()
然后我得到了错误的答案,每个单元的重新采样结果是相同的,尽管每个单元的数据不同。下面给出了我正在运行的查询,其中我没有在 WHERE
子句中指定任何单位:
SELECT disTINCT *
from(
select ts_resample,Val2 as Val2
FROM Sensor_Data.Table1) as t) as tt
对当前数据中的三个单元使用上述查询得到的结果如下:
ts_resample unit Val1 Val2
2020-12-01 00:00:00 unit1 0.8974 10.485
2020-12-01 00:00:00 unit2 0.8974 10.485
2020-12-01 00:00:00 unit3 0.8974 10.485
2020-12-01 00:05:00 unit1 0.9041 11.854
2020-12-01 00:05:00 unit2 0.9041 11.854
2020-12-01 00:05:00 unit3 0.9041 11.854
实际上,我想获得每个单位不同的每个单位的最后一个值。如下图:
ts_resample unit Val1 Val2
2020-12-01 00:00:00 unit1 0.8974 10.485
2020-12-01 00:00:00 unit2 0.9014 11.954
2020-12-01 00:00:00 unit3 0.7854 10.821
2020-12-01 00:05:00 unit1 0.9841 11.125
2020-12-01 00:05:00 unit2 0.8742 10.963
2020-12-01 00:05:00 unit3 0.9632 11.784
有人能告诉我我的查询有什么问题吗?
谢谢
解决方法
我通过使用 ts_resample 在分区中提供单元信息解决了这个问题。最终解决方案如下:
SELECT DISTINCT *
from(
select ts_resample,unit,last_value(Val1) over (partition by ts_resample,unit order by ts rows between unbounded preceding and unbounded following) as Val1,last_value(Val2) over (partition by ts_resample,unit order by ts rows between unbounded preceding and unbounded following) as Val2
from (
SELECT cast(cast(unix_timestamp(cast(ts/1000 as TIMESTAMP))/300 as bigint)*300 as TIMESTAMP) as ts_resample,ts as ts,unit as unit,Val1 as Val1,Val2 as Val2
FROM Sensor_Data.Table1) as t) as tt
在此之后,我得到了我想要的结果并在我的问题中显示。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。