微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

通过随时间变化的状态对数据进行分组

如何解决通过随时间变化的状态对数据进行分组

我试图将组号分配给数据随时间变化的数据集中不同的行组。在我的示例中,更改的字段是tran_seq,prog_id,deg-id,cur_id和enroll_status。当这些字段中的任何一个与上一行不同时,我需要一个新的分组号。如果字段与上一行相同,则分组号应保持不变。当我尝试ROW_NUMBER(),RANK()或DENSE_RANK()时,我得到了同一组的增加值(例如,示例中的前2行)。我觉得我需要对start_date进行排序,因为它是时间数据。

+----+----------+---------+--------+--------+---------------+------------+------------+---------+
|    | tran_seq | prog_id | deg_id | cur_id | enroll_status | start_date |  end_date  | desired |
+----+----------+---------+--------+--------+---------------+------------+------------+---------+
| 1  |    1     |   6     |   9    |   3    |     ENRL      | 2004-08-22 | 2004-12-11 |    1    |
| 2  |    1     |   6     |   9    |   3    |     ENRL      | 2006-01-10 | 2006-05-06 |    1    |
| 3  |    1     |   6     |   9    |   59   |     ENRL      | 2006-08-29 | 2006-12-16 |    2    |
| 4  |    2     |   12    |   23   |   45   |     ENRL      | 2014-01-21 | 2014-05-16 |    3    |
| 5  |    2     |   12    |   23   |   45   |     ENRL      | 2014-08-18 | 2014-12-05 |    3    |
| 6  |    2     |   12    |   23   |   45   |     LOAP      | 2015-01-20 | 2015-05-15 |    4    |
| 7  |    2     |   12    |   23   |   45   |     ENRL      | 2015-08-25 | 2015-12-11 |    5    |
| 8  |    2     |   12    |   23   |   45   |     LOAP      | 2016-01-12 | 2016-05-06 |    6    |
| 9  |    2     |   12    |   23   |   45   |     ENRL      | 2016-05-16 | 2016-08-05 |    7    |
| 10 |    2     |   12    |   23   |   45   |     LOAJ      | 2016-08-23 | 2016-12-02 |    8    |
| 11 |    2     |   12    |   23   |   45   |     ENRL      | 2017-01-18 | 2017-05-05 |    9    |
| 12 |    2     |   12    |   23   |   45   |     ENRL      | 2018-01-17 | 2018-05-11 |    9    |
+----+----------+---------+--------+--------+---------------+------------+------------+---------+

一旦我将数字分组,我想我可以将它们分组以获取最终的目标:带有开始日期和结束日期的不同状态的时间表。对于上面的示例数据,应为:

+---+----------+---------+--------+--------+---------------+------------+------------+
|   | tran_seq | prog_id | deg_id | cur_id | enroll_status | start_date |  end_date  |
+---+----------+---------+--------+--------+---------------+------------+------------+
| 1 |    1     |   6     |   9    |   3    |     ENRL      | 2004-08-22 | 2006-05-06 |
| 2 |    1     |   6     |   9    |   59   |     ENRL      | 2004-08-29 | 2006-12-16 |
| 3 |    2     |   12    |   23   |   45   |     ENRL      | 2014-01-21 | 2014-12-05 |
| 4 |    2     |   12    |   23   |   45   |     LOAP      | 2015-01-20 | 2015-05-15 |
| 5 |    2     |   12    |   23   |   45   |     ENRL      | 2015-08-25 | 2015-12-11 |
| 6 |    2     |   12    |   23   |   45   |     LOAP      | 2016-01-12 | 2016-05-06 |
| 7 |    2     |   12    |   23   |   45   |     ENRL      | 2016-05-16 | 2016-08-05 |
| 8 |    2     |   12    |   23   |   45   |     LOAJ      | 2016-08-23 | 2016-12-02 |
| 9 |    2     |   12    |   23   |   45   |     ENRL      | 2017-01-17 | 2018-05-06 |
+---+----------+---------+--------+--------+---------------+------------+------------+

解决方法

这是一个典型的XY问题,因为您要在中间解决方案中寻求其他解决方案,而不是在询问解决方案本身。

但是,由于您将总体最终目标作为附录进行了补充,因此您可以在没有中间步骤的情况下实现此目标:

declare @t table(tran_seq int,prog_id int,deg_id int,cur_id int,enroll_status varchar(4),start_date date,end_date  date,desired int)
insert into @t values
 (1,6,9,3,'ENRL','2004-08-22','2004-12-11',1),(1,'2006-01-10','2006-05-06',59,'2006-08-29','2006-12-16',2),(2,12,23,45,'2014-01-21','2014-05-16',3),'2014-08-18','2014-12-05','LOAP','2015-01-20','2015-05-15',4),'2015-08-25','2015-12-11',5),'2016-01-12','2016-05-06',6),'2016-05-16','2016-08-05',7),'LOAJ','2016-08-23','2016-12-02',8),'2017-01-18','2017-05-05',9),'2018-01-17','2018-05-11',9)
;

select tran_seq,prog_id,deg_id,cur_id,enroll_status,min(start_date) as start_date,max(end_date) as end_date
from(select *,row_number() over (order by end_date) - row_number() over (partition by tran_seq,enroll_status order by end_date) as grp
     from @t
    ) AS g
group by tran_seq,grp
order by start_date;

输出

+----------+---------+--------+--------+---------------+------------+------------+
| tran_seq | prog_id | deg_id | cur_id | enroll_status | start_date |  end_date  |
+----------+---------+--------+--------+---------------+------------+------------+
|        1 |       6 |      9 |      3 | ENRL          | 2004-08-22 | 2006-05-06 |
|        1 |       6 |      9 |     59 | ENRL          | 2006-08-29 | 2006-12-16 |
|        2 |      12 |     23 |     45 | ENRL          | 2014-01-21 | 2014-12-05 |
|        2 |      12 |     23 |     45 | LOAP          | 2015-01-20 | 2015-05-15 |
|        2 |      12 |     23 |     45 | ENRL          | 2015-08-25 | 2015-12-11 |
|        2 |      12 |     23 |     45 | LOAP          | 2016-01-12 | 2016-05-06 |
|        2 |      12 |     23 |     45 | ENRL          | 2016-05-16 | 2016-08-05 |
|        2 |      12 |     23 |     45 | LOAJ          | 2016-08-23 | 2016-12-02 |
|        2 |      12 |     23 |     45 | ENRL          | 2017-01-18 | 2018-05-11 |
+----------+---------+--------+--------+---------------+------------+------------+

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。