微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

获取留存分析:具有多重不等式的 ASOF JOIN

如何解决获取留存分析:具有多重不等式的 ASOF JOIN

我想根据存储在 ClickHouse 中的埋点数据进行保留/事件分析。假设我有两种类型的事件:app_launch (buried_point_id=1) 和 user_register (buried_point_id=2)。我想了解:在 1 天的时间窗口内,有多少用户在启动应用程序后注册。请参阅下面的示例埋点数据:

buried_point_id 发生在 user_id
1 1613923200 123
1 1614009600 345
2 1613966400 123
2 1614009600 234
2 1614182400 345

这是我要运行的查询

SELECT COUNT (disTINCT t1.user_id),COUNT (disTINCT t2.user_id)
FROM
  (SELECT user_id,happened_at
   FROM buried_points
   WHERE buried_point_id = 1
     AND happened_at >= 1613923200
     AND happened_at <= 1614182400
     AND ) AS t1
ASOF LEFT JOIN
  (SELECT user_id,happened_at
   FROM buried_points
   WHERE buried_point_id = 2
     AND happened_at >= 1613923200
     AND happened_at <= 1614182400) AS t2
ON t1.user_id = t2.user_id
AND t1.happened_at < t2.happened_at
AND t2.happened_at - t1.happened_at < 86400;

这是预期的查询结果:

2 (123,345),1 (123)

但是,根据 ClickHouse docs,只能支持 1 个不等式:

您可以使用任意数量的相等条件和一个最接近的匹配条件。例如,SELECT count() FROM table_1 ASOF LEFT JOIN table_2 ON table_1.a == table_2.b AND table_2.t 、>=、

但是我需要 2 个不等式来完成我的工作 - 这个问题有解决办法吗?

解决方法

考虑使用专门的聚合函数sequenceMatch

SELECT 
  user_id,sequenceMatch('(?1)(?2)')(happened_at,buried_point_id = 1,buried_point_id = 2) retention
FROM (
  /* emulate test dataset */
  SELECT data.1 buried_point_id,data.2  happened_at,data.3 user_id
  FROM (
    SELECT arrayJoin(
      [(1,1613966400,123),(1,1613966411,1613966422,1614009600,345),1614009611,(2,1613923200,234),1614182400,345)]) data)
  )
WHERE happened_at >= 1613923200 AND happened_at <= 1614182400 
GROUP BY user_id  

/*
┌─user_id─┬─retention─┐
│     123 │         0 │
│     234 │         0 │
│     345 │         1 │
└─────────┴───────────┘
*/

基于 minIf 的计算:

SELECT 
  user_id,minIf(happened_at,buried_point_id = 1) first_launch,buried_point_id = 2) first_registration,first_launch != 0 and first_registration > first_launch ? 1 : 0 AS is_user_registered_after_launch
FROM (
  /* emulate test dataset */
  SELECT data.1 buried_point_id,345)]) data)
  )
WHERE happened_at >= 1613923200 AND happened_at <= 1614182400 
GROUP BY user_id 

/*
┌─user_id─┬─first_launch─┬─first_registration─┬─is_user_registered_after_launch─┐
│     123 │   1613966400 │         1613923200 │                               0 │
│     234 │            0 │         1614009600 │                               0 │
│     345 │   1614009600 │         1614182400 │                               1 │
└─────────┴──────────────┴────────────────────┴─────────────────────────────────┘
*/

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。