微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

Clickhouse:滑动/移动窗口

如何解决Clickhouse:滑动/移动窗口

我正在寻找一种有效的方法,以查询Clickstrong中过去的值作为数组在ClickHouse中按一列(即Time)排序的每一行的位置,其中值应该被检索为数组。

ClickHouse仍不支持窗口功能(请参阅#1469),所以我希望使用groupArray()之类的聚合功能解决此问题?

表格:

Time  | Value
12:11 | 1
12:12 | 2
12:13 | 3
12:14 | 4
12:15 | 5
12:16 | 6

窗口大小为n=3的预期结果:

Time  | Value
12:13 | [1,2,3]
12:14 | [2,3,4]
12:15 | [3,4,5]
12:16 | [4,5,6]

ClickHouse当前用于有效查询滑动/移动窗口的方式/功能是什么?如何获得所需的结果?

编辑:

我的解决方案基于@vladimir的响应:

select max(Time) as Time,groupArray(Value) as Values
from (
    select
        *,rowNumberInAllBlocks() as row_number,arrayJoin(range(row_number,row_number + 3)) as window_id
    from (
        /* BEGIN emulate origin dataset */
        select toDateTime(a) as Time,rowNumberInAllBlocks()+1 as Value
        from (
            select arrayJoin([
                '2020-01-01 12:11:00','2020-01-01 12:12:00','2020-01-01 12:13:00','2020-01-01 12:14:00','2020-01-01 12:15:00','2020-01-01 12:16:00']) a
        )
        order by Time
        /* END emulate origin dataset */
    )
    order by Time
) s
group by window_id
having length(Values) = 3
order by Time

请注意,3查询中出现两次,代表窗口大小 n

输出

┌────────────────Time─┬─Values──┐
│ 2020-01-01 12:13:00 │ [1,3] │
│ 2020-01-01 12:14:00 │ [2,4] │
│ 2020-01-01 12:15:00 │ [3,5] │
│ 2020-01-01 12:16:00 │ [4,6] │
└─────────────────────┴─────────┘

解决方法

ClickHouse具有几个数据块范围的窗口功能,让我们来研究neighbor

const uint64_t*

基于源行重复window_size倍的另一种方法:

SELECT Time,[neighbor(Value,-2),neighbor(Value,-1),0)] Values
FROM (
  /* emulate origin data */
  SELECT toDateTime(data.1) as Time,data.2 as Value
  FROM (
    SELECT arrayJoin([('2020-01-01 12:11:00',1),('2020-01-01 12:12:00',2),('2020-01-01 12:13:00',3),('2020-01-01 12:14:00',4),('2020-01-01 12:15:00',5),('2020-01-01 12:16:00',6)]) as data)
  )

/*
┌────────────────Time─┬─Values──┐
│ 2020-01-01 12:11:00 │ [0,1] │
│ 2020-01-01 12:12:00 │ [0,1,2] │
│ 2020-01-01 12:13:00 │ [1,2,3] │
│ 2020-01-01 12:14:00 │ [2,3,4] │
│ 2020-01-01 12:15:00 │ [3,4,5] │
│ 2020-01-01 12:16:00 │ [4,5,6] │
└─────────────────────┴─────────┘

*/

其他示例:

SELECT   
  arrayReduce('max',arrayMap(x -> x.1,raw_result)) Time,arrayMap(x -> x.2,raw_result) Values
FROM (  
  SELECT groupArray((Time,Value)) raw_result,max(row_number) max_row_number
  FROM (
    SELECT 
      3 AS window_size,*,rowNumberInAllBlocks() row_number,arrayJoin(arrayMap(x -> x + row_number,range(window_size))) window_id
    FROM (
      /* emulate origin dataset */
      SELECT toDateTime(data.1) as Time,data.2 as Value
      FROM (
        SELECT arrayJoin([('2020-01-01 12:11:00',6)]) as data)
      ORDER BY Value
      )
    )
  GROUP BY window_id
  HAVING max_row_number = window_id
  ORDER BY window_id
  )
/*
┌────────────────Time─┬─Values──┐
│ 2020-01-01 12:11:00 │ [1]     │
│ 2020-01-01 12:12:00 │ [1,2]   │
│ 2020-01-01 12:13:00 │ [1,6] │
└─────────────────────┴─────────┘
*/

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。