微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

将运行总和分解为最大组大小/长度

如何解决将运行总和分解为最大组大小/长度

我正在尝试将运行(有序)和分解为最大值的组。当我实现以下示例逻辑时...

IF OBJECT_ID(N'tempdb..#t') IS NOT NULL DROP TABLE #t
SELECT TOP (ABS(CHECKSUM(NewId())) % 1000) ROW_NUMBER() OVER (ORDER BY name) AS ID,LEFT(CAST(NEWID() AS NVARCHAR(100)),ABS(CHECKSUM(NewId())) % 30) AS Description
    INTO #t
FROM sys.objects

DECLARE @maxGroupSize INT
SET @maxGroupSize = 100

;WITH t AS (
SELECT
    *,LEN(Description) AS DescriptionLength,SUM(LEN(Description)) OVER (/*PARTITION BY N/A */ ORDER BY ID) AS [RunningLength],SUM(LEN(Description)) OVER (/*PARTITION BY N/A */ ORDER BY ID)/@maxGroupSize AS GroupID
FROM #t
)
SELECT *,SUM(DescriptionLength) OVER (PARTITION BY GroupID) AS SumOfGroup
FROM t
ORDER BY GroupID,ID

我得到的组大于最大组大小(长度)100。

Sample results

解决方法

递归公用表表达式 (rcte) 将是解决此问题的一种方法。

示例数据

有限的固定样本数据集。

create table data
(
  id int,description nvarchar(20)
);

insert into data (id,description) values
( 1,'qmlsdkjfqmsldk'),( 2,'mldskjf'),( 3,'qmsdlfkqjsdm'),( 4,'fmqlsdkfq'),( 5,'qdsfqsdfqq'),( 6,'mds'),( 7,'qmsldfkqsjdmfqlkj'),( 8,'qdmsl'),( 9,'mqlskfjqmlkd'),(10,'qsdqfdddffd');

解决方案

对于每个递归步骤评估 (r.group_running_length + len(d.description) <= @group_max_length) 是否必须扩展前一组或必须在 case 表达式中启动新组。

将组目标大小设置为 40 以更好地拟合样本数据。

declare @group_max_length int = 40;

with rcte as
(
  select d.id,d.description,len(d.description) as description_length,len(d.description) as running_length,1 as group_id,len(d.description) as group_running_length
  from data d
  where d.id = 1
union all
  select d.id,len(d.description),r.running_length + len(d.description),case
           when r.group_running_length + len(d.description) <= @group_max_length
           then r.group_id
           else r.group_id + 1
         end,case
           when r.group_running_length + len(d.description) <= @group_max_length
           then r.group_running_length + len(d.description)
           else len(d.description)
         end
  from rcte r
  join data d
    on d.id = r.id + 1
)
select r.id,r.description,r.description_length,r.running_length,r.group_id,r.group_running_length,gs.group_sum
from rcte r
cross apply ( select max(r2.group_running_length) as group_sum
              from rcte r2
              where r2.group_id = r.group_id ) gs -- group sum
order by r.id;

结果

包含运行组长度以及每行的组总和。

id  description       description_length  running_length  group_id  group_running_length  group_sum
--  ----------------  ------------------  --------------  --------  --------------------  ---------
1   qmlsdkjfqmsldk     14                   14            1         14                    33
2   mldskjf             7                   21            1         21                    33
3   qmsdlfkqjsdm       12                   33            1         33                    33
4   fmqlsdkfq           9                   42            2          9                    39
5   qdsfqsdfqq         10                   52            2         19                    39
6   mds                 3                   55            2         22                    39
7   qmsldfkqsjdmfqlkj  17                   72            2         39                    39
8   qdmsl               5                   77            3          5                    28
9   mqlskfjqmlkd       12                   89            3         17                    28
10  qsdqfdddffd        11                  100            3         28                    28

Fiddle 查看实际情况(包括随机数据版本)。

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。