如何解决获取留存分析:具有多重不等式的 ASOF JOIN
我想根据存储在 ClickHouse 中的埋点数据进行保留/事件分析。假设我有两种类型的事件:app_launch (buried_point_id=1) 和 user_register (buried_point_id=2)。我想了解:在 1 天的时间窗口内,有多少用户在启动应用程序后注册。请参阅下面的示例埋点数据:
buried_point_id | 发生在 | user_id |
---|---|---|
1 | 1613923200 | 123 |
1 | 1614009600 | 345 |
2 | 1613966400 | 123 |
2 | 1614009600 | 234 |
2 | 1614182400 | 345 |
这是我要运行的查询:
SELECT COUNT (disTINCT t1.user_id),COUNT (disTINCT t2.user_id)
FROM
(SELECT user_id,happened_at
FROM buried_points
WHERE buried_point_id = 1
AND happened_at >= 1613923200
AND happened_at <= 1614182400
AND ) AS t1
ASOF LEFT JOIN
(SELECT user_id,happened_at
FROM buried_points
WHERE buried_point_id = 2
AND happened_at >= 1613923200
AND happened_at <= 1614182400) AS t2
ON t1.user_id = t2.user_id
AND t1.happened_at < t2.happened_at
AND t2.happened_at - t1.happened_at < 86400;
这是预期的查询结果:
2 (123,345),1 (123)
但是,根据 ClickHouse docs,只能支持 1 个不等式:
您可以使用任意数量的相等条件和一个最接近的匹配条件。例如,SELECT count() FROM table_1 ASOF LEFT JOIN table_2 ON table_1.a == table_2.b AND table_2.t 、>=、
但是我需要 2 个不等式来完成我的工作 - 这个问题有解决办法吗?
解决方法
考虑使用专门的聚合函数sequenceMatch:
SELECT
user_id,sequenceMatch('(?1)(?2)')(happened_at,buried_point_id = 1,buried_point_id = 2) retention
FROM (
/* emulate test dataset */
SELECT data.1 buried_point_id,data.2 happened_at,data.3 user_id
FROM (
SELECT arrayJoin(
[(1,1613966400,123),(1,1613966411,1613966422,1614009600,345),1614009611,(2,1613923200,234),1614182400,345)]) data)
)
WHERE happened_at >= 1613923200 AND happened_at <= 1614182400
GROUP BY user_id
/*
┌─user_id─┬─retention─┐
│ 123 │ 0 │
│ 234 │ 0 │
│ 345 │ 1 │
└─────────┴───────────┘
*/
基于 minIf 的计算:
SELECT
user_id,minIf(happened_at,buried_point_id = 1) first_launch,buried_point_id = 2) first_registration,first_launch != 0 and first_registration > first_launch ? 1 : 0 AS is_user_registered_after_launch
FROM (
/* emulate test dataset */
SELECT data.1 buried_point_id,345)]) data)
)
WHERE happened_at >= 1613923200 AND happened_at <= 1614182400
GROUP BY user_id
/*
┌─user_id─┬─first_launch─┬─first_registration─┬─is_user_registered_after_launch─┐
│ 123 │ 1613966400 │ 1613923200 │ 0 │
│ 234 │ 0 │ 1614009600 │ 0 │
│ 345 │ 1614009600 │ 1614182400 │ 1 │
└─────────┴──────────────┴────────────────────┴─────────────────────────────────┘
*/
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。