Teradata 重叠期间有间隙

如何解决Teradata 重叠期间有间隙

非常感谢您对尝试对重叠时期进行分组的帮助。这是我在表 1 中的源数据：

Start_Date  End_Date    Status  Id    Main_Id
01.01.2020  03.05.2020  0       11    1
01.02.2020  14.04.2020  3       12    1
14.04.2020  15.05.2020  5       13    1
10.05.2020  20.05.2020  0       14    1
22.05.2020  25.05.2020  2       15    1

所需的输出：

Valid_Period        Decision      Main_Id
01.01.2020  01.02.2020  NOK       1
01.02.2020  03.05.2020  DOOMED    1
03.05.2020  10.05.2020  BAD       1
10.05.2020  15.05.2020  DOOMED    1
15.05.2020  20.05.2020  NOK       1
20.05.2020  22.05.2020  OK        1
22.05.2020  25.05.2020  BAD       1

根据输入状态和周期，输出中可以有 4 种不同的决策：

状态 0 存在（同时没有状态 >0） - 'NOK'
状态 >0 存在（但没有状态 0 在同时) - '坏'
状态 >0 和 0 同时存在 - 'DOOMED'
其他状态都不存在 - 这是为了弥补期间之间的时间间隔 - “OK”

我稍后将用于报告目的的输出 - 需要能够在任何给定时间点为每个 Main_Id 获得正确的结果。到目前为止，我已经尝试使用一个 SELECT 将重叠状态 0 和 >0 组合在一起，并使用单独的 SELECT 来覆盖时间间隔，然后使用 UNIION ALL 将它们放在一起：

 select norMALIZE ON MEETS OR OVERLAPS (
case 
        when table1.Status = 1 then 'NOK'  
        when table1.Status >1 then 'BAD' end) Decision,period(table1.Start_Date,coalesce (table1.End_Date,cast('9999-01-01' as date))) valid_period,table1.Main_Id
from table1

union all

select  'OK' Decision,period (a.prev_end_date,a.Start_Date) valid_period,a.Main_Id
from    (
    select  
    table1.Main_Id,LAG (table1.End_Date) over (
    partition by table1.Main_Id 
    order by table1.Start_Date,cast('9999-01-01' as date))) prev_end_date,table1.Start_Date
    from table1
    qualify prev_end_date < table1.Start_Date)a 
;

当前输出应该是这样的：

01.01.2020  03.05.2020  NOK
01.02.2020  15.05.2020  BAD
10.05.2020  20.05.2020  NOK
22.05.2020  25.05.2020  BAD
20.05.2020  22.0.2020   OK

这是我第一次尝试使用 teradata PERIOD 数据类型，所以仍在学习。还尝试使用 calendar_date 为该期间的每一天获取一行，但它没有覆盖间隙并且线轴已经变得太大了：

select c.calendar_date,table1
MAX(CASE 
    WHEN table1.status = 1
    THEN 1
    ELSE 0
END) NOK,MAX(CASE 
    WHEN table1.status > 1
    THEN 1
    ELSE 0
END) BAD
from table1
join table_calendar c
on  table1.start_date<=c.calendar_date and (
table1.end_date>c.calendar_date or c.calendar_date is null
)
group by c.calendar_date
;

因此仍在努力以良好的性能获得所需的结果。任何帮助，将不胜感激。提前致谢！

解决方法

这是一个非常棘手的问题。我的回答是基于 similar question，可以用一些新语法来简化：

with all_ranges as
 ( -- create ranges based on all start/end dates
   select  
      dt,Main_Id,period(dt,lead(dt) over(partition by Main_Id order by dt)) as pd 
   from table1
    -- split rows into begin/end
   unpivot(dt for col in(Start_Date as '1',End_Date as '-1')) as p
   group by Main_Id,dt
   qualify pd is not null -- remove last row
 )
select normalize
   coalesce(ar.pd P_INTERSECT PERIOD(t1.Start_Date,t1.End_Date),ar.pd) AS ValidPeriod,ar.Main_Id,case max(case when Status = 0 then 1 else 0 end)
      + max(case when Status > 0 then 2 else 0 end)
     when 0 then 'OK'     -- no row,cover time gaps between periods
     when 1 then 'NOK'    -- Status 0 exists (and no status >0 at the same time)
     when 2 then 'BAD'    -- Status >0 exists (but no status 0 at the same time)
     when 3 then 'DOOMED' -- Status >0 and 0 exist together
   end
from all_ranges as ar 
left join table1 as t1
  on ar.Main_Id = t1.Main_Id
 -- match the ranges to existing rows
 and ar.pd overlaps PERIOD(t1.Start_Date,t1.End_Date)
group by ar.Main_Id,ValidPeriod
order by 1;