如何解决SQL-如何在特定行之前选择x行数
我有这张桌子:
ts | user_id | event |
-------------------------------
1500 a eat
1501 a walk
1502 a sleep
1500 b eat
1501 b sleep
1502 b wake
1500 c walk
1501 c eat
1502 c sit
1503 c sleep
1504 c wake
因此,我想选择某个事件之前的x
行数,假设我想为每个user_id选择sleep
之前的2个事件。
我的决赛桌结果应该像这样:
user_id | event | rank |
--------------------------------
a eat 1
a walk 2
a sleep 3
b NULL 0
b eat 1
b sleep 2
c eat 2
c sit 3
c sleep 4
如何在SQL(特别是Redshift SQl)中执行此操作
解决方法
这是一个缺岛问题,您需要每个岛的第一行和最后两行。
最安全的方法可能是休眠事件的窗口总和以定义组,然后使用row_number()
进行过滤:
select *
from (
select t.*,row_number() over(partition by user_id,grp order by ts) rn_asc,grp order by ts desc) rn_desc
from (
select t.*,sum(case when event = 'sleep' then 1 else 0 end)
over(partition by user_id order by ts desc) grp
from mytable t
) t
) t
where (rn_asc = 1 or rn_desc <= 2) and grp > 0
order by user_id,ts
我们定义的岛屿中,“睡眠”事件的窗口计数按降序排列。然后,我们只按升序和降序枚举每个岛行,并根据我们感兴趣的记录进行过滤。
ts | user_id | event | grp | rn_asc | rn_desc ---: | :------ | :---- | --: | -----: | ------: 1500 | a | eat | 1 | 1 | 3 1501 | a | walk | 1 | 2 | 2 1502 | a | sleep | 1 | 3 | 1 1500 | b | eat | 1 | 1 | 2 1501 | b | sleep | 1 | 2 | 1 1500 | c | walk | 1 | 1 | 4 1502 | c | sit | 1 | 3 | 2 1503 | c | sleep | 1 | 4 | 1
编辑
Redshift在窗口函数的order by
子句中需要一个窗口框架。因此,键入时间会更长一些:
select *
from (
select t.*,row_number() over(
partition by user_id,grp
order by ts rows between unbounded preceding and current row
) rn_asc,grp
order by ts rows between unbounded preceding and current row
) rn_desc
from (
select t.*,sum(case when event = 'sleep' then 1 else 0 end) over(
partition by user_id
order by ts desc
order by ts rows between unbounded preceding and current row
) grp
from mytable t
) t
) t
where (rn_asc = 1 or rn_desc <= 2) and grp > 0
order by user_id,ts
,
嗯。 。 。您可以使用lead()
:
select t.*
from (select t.*,lead(event) over (partition by user_id order by ts) as next_event,lead(event,2) over (partition by user_id order by ts) as next_event2
from t
) t
where 'sleep' in (event,next_event,next_event2);
注意:这仅返回数据中的行。如果需要生成行,则需要其他逻辑。
编辑:
您实际上可以对此进行概括:
select t.*
from (select t.*,sum(case when event = 'sleep') over (partition by user_id order by ts rows between current row and 2 following) as cnt_sleep
from t
) t
where cnt_sleep > 0;
这将计算接下来的n
行中的“睡眠”次数(n-1)。如果在其中任何一个中找到“睡眠”,它将返回一行。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。