微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

sql-server – 将每日时间表分组为[开始日期;结束日期]与工作日列表的间隔

我需要在两个系统之间转换数据.

一个系统将日程表存储为日期的简单列表.计划中包含的每个日期都是一行.
日期顺序(周末,公众假期和更长的停顿)可能存在各种差距,一周中的某些日子可能会被排除在日程表之外.根本没有差距,甚至可以包括周末.时间表最长可达2年.通常是几周.

以下是一个简单的计划示例,该计划跨越两周,不包括周末(下面的脚本中有更复杂的示例):

+----+------------+------------+---------+--------+
| ID | ContractID |     dt     | dowChar | dowInt |
+----+------------+------------+---------+--------+
| 10 |          1 | 2016-05-02 | Mon     |      2 |
| 11 |          1 | 2016-05-03 | Tue     |      3 |
| 12 |          1 | 2016-05-04 | Wed     |      4 |
| 13 |          1 | 2016-05-05 | Thu     |      5 |
| 14 |          1 | 2016-05-06 | Fri     |      6 |
| 15 |          1 | 2016-05-09 | Mon     |      2 |
| 16 |          1 | 2016-05-10 | Tue     |      3 |
| 17 |          1 | 2016-05-11 | Wed     |      4 |
| 18 |          1 | 2016-05-12 | Thu     |      5 |
| 19 |          1 | 2016-05-13 | Fri     |      6 |
+----+------------+------------+---------+--------+

ID是唯一的,但它不一定是顺序的(它是主键).
日期在每个合约中都是唯一的(在(ContractID,dt)上有唯一索引).

第二个系统将时间表存储为时间间隔,其中包含作为时间表一部分的工作日列表.每个间隔由其开始和结束日期(包括)和计划中包含的工作日列表定义.在这种格式中,您可以有效地定义重复的每周模式,例如周一至周三,但是当模式被中断时(例如公共假期),它会变得很痛苦.

以下简单示例如下所示:

+------------+------------+------------+----------+----------------------+
| ContractID |  StartDT   |   EndDT    | DayCount |       WeekDays       |
+------------+------------+------------+----------+----------------------+
|          1 | 2016-05-02 | 2016-05-13 |       10 | Mon,Tue,Wed,Thu,Fri,|
+------------+------------+------------+----------+----------------------+

属于同一合同的[StartDT; EndDT]间隔不应重叠.

我需要将第一个系统的数据转换为第二个系统使用的格式.
目前我在C#的客户端为单个给定的合同解决了这个问题,
但我想在服务器端的T-sql中进行批量处理和服务器之间的导出/导入.
最有可能的是,它可以使用CLR UDF完成,但在此阶段我不能使用sqlCLR.

这里的挑战是使间隔列表尽可能简短和人性化.

例如,这个时间表:

+-----+------------+------------+---------+--------+
| ID  | ContractID |     dt     | dowChar | dowInt |
+-----+------------+------------+---------+--------+
| 223 |          2 | 2016-05-05 | Thu     |      5 |
| 224 |          2 | 2016-05-06 | Fri     |      6 |
| 225 |          2 | 2016-05-09 | Mon     |      2 |
| 226 |          2 | 2016-05-10 | Tue     |      3 |
| 227 |          2 | 2016-05-11 | Wed     |      4 |
| 228 |          2 | 2016-05-12 | Thu     |      5 |
| 229 |          2 | 2016-05-13 | Fri     |      6 |
| 230 |          2 | 2016-05-16 | Mon     |      2 |
| 231 |          2 | 2016-05-17 | Tue     |      3 |
+-----+------------+------------+---------+--------+

应该成为这样的:

+------------+------------+------------+----------+----------------------+
| ContractID |  StartDT   |   EndDT    | DayCount |       WeekDays       |
+------------+------------+------------+----------+----------------------+
|          2 | 2016-05-05 | 2016-05-17 |        9 | Mon,|
+------------+------------+------------+----------+----------------------+

,不是这个:

+------------+------------+------------+----------+----------------------+
| ContractID |  StartDT   |   EndDT    | DayCount |       WeekDays       |
+------------+------------+------------+----------+----------------------+
|          2 | 2016-05-05 | 2016-05-06 |        2 | Thu,|
|          2 | 2016-05-09 | 2016-05-13 |        5 | Mon,|
|          2 | 2016-05-16 | 2016-05-17 |        2 | Mon,|
+------------+------------+------------+----------+----------------------+

我尝试对这个问题应用间隙和岛屿方法.我尝试过两次通过.在第一次通过中,我发现连续几天的岛屿,即岛屿的末端是天数,天周,公众假期或其他东西的任何差距.对于每个这样的岛屿,我建立了一个以逗号分隔的不同周末列表.在第二次通过中,I组通过查看周数序列中的间隙或WeekDays的变化进一步发现了岛屿.

使用这种方法,每个部分周最终作为额外的间隔,如上所示,因为即使周数是连续的,周日也会改变.此外,一周内可能存在规律的差距(参见样本数据中的ContractID = 3,其数据仅为周一,周三,周五),这种方法会在此类计划中为每一天生成单独的间隔.好的一面是,如果时间表根本没有任何间隙,它会生成一个时间间隔(请参阅包含周末的样本数据中的ContractID = 7),在这种情况下,开始或结束周是否是部分无关紧要.

请参阅下面脚本中的其他示例,以便更好地了解我的目标.您可以看到周末被排除在外,但也可以排除一周中的任何其他日子.在示例3中,只有周一,周三和周五是计划的一部分.此外,可以包括周末,如示例7所示.解决方案应该平等地处理一周中的所有日子.可以在计划中包括或排除一周中的任何一天.

要验证生成的间隔列表是否正确描述了给定的调度,您可以使用以下伪代码

>循环所有间隔
>对于开始日期和结束日期(包括)之间的所有日历日期的每个间隔循环.
>对于每个日期检查,是否在WeekDays中列出其星期几.如果是,则该日期包含在日程表中.

希望这可以澄清在什么情况下应该创建新的间隔.在示例4和5中,从周期的中间移除一个星期一(2016-05-09),并且这样的调度不能由单个间隔表示.在示例6中,计划中存在长的间隙,因此需要两个间隔.

间隔表示计划中的每周模式,并且当模式被中断/更改时,必须添加新的间隔.在示例11中,前三周有一个模式Tue,然后这个模式变为星期四.因此,我们需要两个时间间隔来描述这样的时间表.

我目前正在使用sql Server 2008,因此解决方案应该适用于此版本.
如果可以使用更高版本的功能简化/改进sql Server 2008的解决方案,这是一个奖励,请同时显示它.

我有一个Calendar表(日期列表)和Numbers表(从1开始的整数列表),所以如果需要,可以使用它们.创建临时表并使用多个查询可以分几个阶段处理数据也是可以的.算法中的阶段数必须修复,游标和显式WHILE循环不正常.

样本数据和预期结果的脚本

-- @Src is sample data
-- @Dst is expected result

DECLARE @Src TABLE (ID int PRIMARY KEY,ContractID int,dt date,dowChar char(3),dowInt int);
INSERT INTO @Src (ID,ContractID,dt,dowChar,dowInt) VALUES

-- simple two weeks (without weekend)
(110,1,'2016-05-02','Mon',2),(111,'2016-05-03','Tue',3),(112,'2016-05-04','Wed',4),(113,'2016-05-05','Thu',5),(114,'2016-05-06','Fri',6),(115,'2016-05-09',(116,'2016-05-10',(117,'2016-05-11',(118,'2016-05-12',(119,'2016-05-13',-- a partial end of the week,the whole week,partial start of the week (without weekends)
(223,2,(224,(225,(226,(227,(228,(229,(230,'2016-05-16',(231,'2016-05-17',-- only Mon,Fri are included across two weeks plus partial third week
(310,3,(311,(314,(315,(317,(319,(330,-- a whole week (without weekend),in the second week Mon is not included
(410,4,(411,(412,(413,(414,(416,(417,(418,(419,-- three weeks,but without Mon in the second week (no weekends)
(510,5,(511,(512,(513,(514,(516,(517,(518,(519,(520,(521,(522,'2016-05-18',(523,'2016-05-19',(524,'2016-05-20',-- long gap between two intervals
(623,6,(624,(625,(626,(627,(628,(629,(630,(631,(645,'2016-06-06',(646,'2016-06-07',(647,'2016-06-08',(648,'2016-06-09',(649,'2016-06-10',(655,'2016-06-13',(656,'2016-06-14',(657,'2016-06-15',(658,'2016-06-16',(659,'2016-06-17',-- two weeks,no gaps between days at all,even weekends are included
(710,7,(711,(712,(713,(714,(715,'2016-05-07','Sat',7),(716,'2016-05-08','Sun',1),(725,(726,(727,(728,(729,-- no gaps between days at all,even weekends are included,with partial weeks
(805,8,'2016-04-30',(806,'2016-05-01',(810,(811,(812,(813,(814,(815,(816,(825,(826,(827,(828,(829,(830,'2016-05-14',-- only Mon-Wed included,two weeks plus partial third week
(910,9,(911,(912,(915,(916,(917,(930,(931,-- only Thu-Sun included,three weeks
(1013,10,(1014,(1015,(1016,(1018,(1019,(1020,(1021,'2016-05-15',(1023,(1024,(1025,'2016-05-21',(1026,'2016-05-22',-- only Tue for first three weeks,then only Thu for the next three weeks
(1111,11,(1116,(1131,(1123,(1124,'2016-05-26',(1125,'2016-06-02',-- one week,then one week gap,then one week
(1210,12,(1211,(1212,(1213,(1214,(1215,(1216,(1217,(1218,(1219,6);

SELECT ID,dowInt
FROM @Src
ORDER BY ContractID,dt;


DECLARE @Dst TABLE (ContractID int,StartDT date,EndDT date,DayCount int,WeekDays varchar(255));
INSERT INTO @Dst (ContractID,StartDT,EndDT,DayCount,WeekDays) VALUES
(1,'Mon,'),(2,(3,(4,'Tue,(5,(6,(7,'Sun,Mon,Sat,(8,15,(9,(10,(11,'Thu,(12,');

SELECT ContractID,WeekDays
FROM @Dst
ORDER BY ContractID,StartDT;

答案比较

真实的表@Src有403,555行,有15,857个不同的ContractID.
所有答案都会产生正确的结果(至少对我的数据而言)并且所有答案都相当快,但它们的最优性不同.生成的间隔越少越好.为了好奇,我把运行时间包括在内.主要关注的是正确和最佳的结果,而不是速度(除非需要太长时间 – 我在10分钟后停止了Ziggy Crueltyfree Zeitgeister的非递归查询).

+--------------------------------------------------------+-----------+---------+
|                         Answer                         | Intervals | Seconds |
+--------------------------------------------------------+-----------+---------+
| Ziggy Crueltyfree Zeitgeister                          |     25751 |    7.88 |
| While loop                                             |           |         |
|                                                        |           |         |
| Ziggy Crueltyfree Zeitgeister                          |     25751 |    8.27 |
| Recursive                                              |           |         |
|                                                        |           |         |
| Michael Green                                          |     25751 |   22.63 |
| Recursive                                              |           |         |
|                                                        |           |         |
| Geoff Patterson                                        |     26670 |    4.79 |
| Weekly gaps-and-islands with merging of partial weeks  |           |         |
|                                                        |           |         |
| Vladimir Baranov                                       |     34560 |    4.03 |
| Daily,then weekly gaps-and-islands                    |           |         |
|                                                        |           |         |
| Mikael Eriksson                                        |     35840 |    0.65 |
| Weekly gaps-and-islands                                |           |         |
+--------------------------------------------------------+-----------+---------+
| Vladimir Baranov                                       |     25751 |  121.51 |
| Cursor                                                 |           |         |
+--------------------------------------------------------+-----------+---------+

解决方法

这个使用递归CTE.其结果与问题中的示例相同.提出这个噩梦……代码包括评论,以缓解其错综复杂的逻辑.
SET DATEFirsT 1 -- Make Monday weekday=1

DECLARE @Ranked TABLE (RowID int NOT NULL IDENTITY PRIMARY KEY,-- Incremental uninterrupted sequence in the right order
                       ID int NOT NULL UNIQUE,ContractID int NOT NULL,-- Original relevant values (ID is not really necessary)
                       WeekNo int NOT NULL,dowBit int NOT NULL);                 -- Useful to find gaps in days or weeks
INSERT INTO @Ranked
SELECT ID,DATEDIFF(WEEK,'1900-01-01',DATEADD(DAY,1-DATEPART(dw,dt),dt)) AS WeekNo,POWER(2,DATEPART(dw,dt)-1) AS dowBit
FROM @Src
ORDER BY ContractID,WeekNo,dowBit

/*
Each evaluated date makes part of the carried sequence if:
  - this is not a new contract,and
    - sequence started this week,or
    - same day last week was part of the sequence,or
    - sequence started last week and today is a lower day than the accumulated weekdays list
  - and there are no sequence gaps since prevIoUs day
(otherwise it does not make part of the old sequence,so it starts a new one) */

DECLARE @RankedRanges TABLE (RowID int NOT NULL PRIMARY KEY,WeekDays int NOT NULL,StartRowID int NULL);

WITH WeeksCTE AS -- Needed for building the sequence gradually,and comparing the carried sequence (and prevIoUs day) with a current evaluated day
( 
    SELECT RowID,dowBit,RowID AS StartRowID,WeekNo AS StartWN,dowBit AS WeekDays,dowBit AS StartWeekDays
    FROM @Ranked
    WHERE RowID = 1 
    UNION ALL
    SELECT RowID,StartRowID,CASE WHEN StartRowID IS NULL THEN StartWN ELSE WeekNo END AS WeekNo,CASE WHEN StartRowID IS NULL THEN WeekDays | dowBit ELSE dowBit END AS WeekDays,CASE WHEN StartRowID IS NOT NULL THEN dowBit WHEN WeekNo = StartWN THEN StartWeekDays | dowBit ELSE StartWeekDays END AS StartWeekDays
    FROM (
        SELECT w.*,pre.StartWN,pre.WeekDays,pre.StartWeekDays,CASE WHEN w.ContractID <> pre.ContractID OR     -- New contract always break the sequence
                         NOT (w.WeekNo = pre.StartWN OR        -- Same week as a new sequence always keeps the sequence
                              w.dowBit & pre.WeekDays > 0 OR   -- Days in the sequence keep the sequence (provided there are no gaps,checked later)
                              (w.WeekNo = pre.StartWN+1 AND (w.dowBit-1) & pre.StartWeekDays = 0)) OR -- Days in the second week when less than a week passed since the sequence started remain in sequence
                         (w.WeekNo > pre.StartWN AND -- look for gap after initial week
                          w.WeekNo > pre.WeekNo+1 OR -- look for full-week gaps
                          (w.WeekNo = pre.WeekNo AND                            -- when same week as prevIoUs day,((w.dowBit-1) ^ (pre.dowBit*2-1)) & pre.WeekDays > 0 -- days between this and prevIoUs weekdays,compared to current series
                          ) OR
                          (w.WeekNo > pre.WeekNo AND                                   -- when following week of prevIoUs day,((-1 ^ (pre.dowBit*2-1)) | (w.dowBit-1)) & pre.WeekDays > 0 -- days between this and prevIoUs weekdays,compared to current series
                          )) THEN w.RowID END AS StartRowID
        FROM WeeksCTE pre
        JOIN @Ranked w ON (w.RowID = pre.RowID + 1)
        ) w
) 
INSERT INTO @RankedRanges -- days sequence and starting point of each sequence
SELECT RowID,WeekDays,StartRowID
--SELECT *
FROM WeeksCTE
OPTION (MAXRECURSION 0)

--SELECT * FROM @RankedRanges

DECLARE @Ranges TABLE (RowNo int NOT NULL IDENTITY PRIMARY KEY,RowID int NOT NULL);

INSERT INTO @Ranges       -- @RankedRanges filtered only by start of each range,with numbered rows to easily find the end of each range
SELECT StartRowID
FROM @RankedRanges
WHERE StartRowID IS NOT NULL
ORDER BY 1

-- Final result putting everything together
SELECT rs.ContractID,rs.dt AS StartDT,re.dt AS EndDT,re.RowID-rs.RowID+1 AS DayCount,CASE WHEN rr.WeekDays & 64 > 0 THEN 'Sun,' ELSE '' END +
       CASE WHEN rr.WeekDays & 1 > 0 THEN 'Mon,' ELSE '' END +
       CASE WHEN rr.WeekDays & 2 > 0 THEN 'Tue,' ELSE '' END +
       CASE WHEN rr.WeekDays & 4 > 0 THEN 'Wed,' ELSE '' END +
       CASE WHEN rr.WeekDays & 8 > 0 THEN 'Thu,' ELSE '' END +
       CASE WHEN rr.WeekDays & 16 > 0 THEN 'Fri,' ELSE '' END +
       CASE WHEN rr.WeekDays & 32 > 0 THEN 'Sat,' ELSE '' END AS WeekDays
FROM (
    SELECT r.RowID AS StartRowID,COALESCE(pos.RowID-1,(SELECT MAX(RowID) FROM @Ranked)) AS EndRowID
    FROM @Ranges r
    LEFT JOIN @Ranges pos ON (pos.RowNo = r.RowNo + 1)
    ) g
JOIN @Ranked rs ON (rs.RowID = g.StartRowID)
JOIN @Ranked re ON (re.RowID = g.EndRowID)
JOIN @RankedRanges rr ON (rr.RowID = re.RowID)

另一种策略

这个应该比前一个快得多,因为它不依赖于sql Server 2008中缓慢有限的递归CTE,尽管它实现了或多或少相同的策略.

一个WHILE循环(我无法设法避免它),但是减少了迭代次数(任何给定契约的最大序列数(减1)).

这是一个简单的策略,可用于短于或长于一周的序列(替换任何其他数字的常数7的任何出现,以及从DayNo的MODULUS x而不是DATEPART(wk)计算的dowBit),直到32.

SET DATEFirsT 1 -- Make Monday weekday=1

-- Get the minimum information needed to calculate sequences
DECLARE @Days TABLE (ContractID int NOT NULL,DayNo int NOT NULL,dowBit int NOT NULL,PRIMARY KEY (ContractID,DayNo));
INSERT INTO @Days
SELECT ContractID,CAST(CAST(dt AS datetime) AS int) AS DayNo,dt)-1) AS dowBit
FROM @Src

DECLARE @RangeStartFirstPass TABLE (ContractID int NOT NULL,DayNo))

-- Calculate,from the above list,which days are not present in the prevIoUs 7
INSERT INTO @RangeStartFirstPass
SELECT r.ContractID,r.DayNo
FROM @Days r
LEFT JOIN @Days pr ON (pr.ContractID = r.ContractID AND pr.DayNo BETWEEN r.DayNo-7 AND r.DayNo-1) -- Last 7 days
GROUP BY r.ContractID,r.DayNo,r.dowBit
HAVING r.dowBit & COALESCE(SUM(pr.dowBit),0) = 0

-- Update the prevIoUs list with all days that occur right after a missing day
INSERT INTO @RangeStartFirstPass
SELECT *
FROM (
    SELECT disTINCT ContractID,(SELECT MIN(DayNo) FROM @Days WHERE ContractID = d.ContractID AND DayNo > d.DayNo + 7) AS DayNo
    FROM @Days d
    WHERE NOT EXISTS (SELECT 1 FROM @Days WHERE ContractID = d.ContractID AND DayNo = d.DayNo + 7)
    ) d
WHERE DayNo IS NOT NULL AND
      NOT EXISTS (SELECT 1 FROM @RangeStartFirstPass WHERE ContractID = d.ContractID AND DayNo = d.DayNo)

DECLARE @RangeStart TABLE (ContractID int NOT NULL,DayNo));

-- Fetch the first sequence for each contract
INSERT INTO @RangeStart
SELECT ContractID,MIN(DayNo)
FROM @RangeStartFirstPass
GROUP BY ContractID

-- Add to the list above the next sequence for each contract,until all are added
-- (ensure no sequence is added with less than 7 days)
WHILE @@ROWCOUNT > 0
  INSERT INTO @RangeStart
  SELECT f.ContractID,MIN(f.DayNo)
  FROM (SELECT ContractID,MAX(DayNo) AS DayNo FROM @RangeStart GROUP BY ContractID) s
  JOIN @RangeStartFirstPass f ON (f.ContractID = s.ContractID AND f.DayNo > s.DayNo + 7)
  GROUP BY f.ContractID

-- Summarise results
SELECT ContractID,CASE WHEN WeekDays & 64 > 0 THEN 'Sun,' ELSE '' END +
       CASE WHEN WeekDays & 1 > 0 THEN 'Mon,' ELSE '' END +
       CASE WHEN WeekDays & 2 > 0 THEN 'Tue,' ELSE '' END +
       CASE WHEN WeekDays & 4 > 0 THEN 'Wed,' ELSE '' END +
       CASE WHEN WeekDays & 8 > 0 THEN 'Thu,' ELSE '' END +
       CASE WHEN WeekDays & 16 > 0 THEN 'Fri,' ELSE '' END +
       CASE WHEN WeekDays & 32 > 0 THEN 'Sat,' ELSE '' END AS WeekDays
FROM (
    SELECT r.ContractID,MIN(d.dt) AS StartDT,MAX(d.dt) AS EndDT,COUNT(*) AS DayCount,SUM(disTINCT d.dowBit) AS WeekDays
    FROM (SELECT *,COALESCE((SELECT MIN(DayNo) FROM @RangeStart WHERE ContractID = rs.ContractID AND DayNo > rs.DayNo),999999) AS DayEnd FROM @RangeStart rs) r
    JOIN @Days d ON (d.ContractID = r.ContractID AND d.DayNo BETWEEN r.DayNo AND r.DayEnd-1)
    GROUP BY r.ContractID,r.DayNo
    ) d
ORDER BY ContractID,StartDT

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐