如何解决在Azure数据资源管理器中具有每周条目的表中,计算X周数的平均值,这是自联接之外的其他选项吗?
我有一张桌子,其中每一行都来自一周。同一周有多个行,但是基于几个维度,它们是唯一的。
| Week | Col1 | Col2 |
----------------------
| W1 | X1 | a |
| W1 | X2 | b |
| W2 | X3 | a |
.
... More rows
我想计算4周(或通常X周)内Col1
的平均值。
我知道我可以通过将表与自身连接4次来做到这一点,但这似乎并不正确……还有更好的方法吗?
样本输入数据表:
datatable (Week:datetime,Value:decimal,Dim1:string)
[datetime(2020-08-03),1,"a",datetime(2020-08-03),2,"b",datetime(2020-08-10),datetime(2020-08-17),"c",datetime(2020-08-24),4,datetime(2020-08-31),3,"c"]
我期望的结果是(在本示例中,我将最后一天用作平均值的“天”)。请注意,如果一周内没有任何价值,我会假设它是0
。此外,如果某个维度出现在一周的某一周内,则该维度会包含在最终平均值中(不会发生这种情况,但为完整性起见会添加此数字):
| Week | Average_Value | Dim1 |
-------------------------------------
| 2020-08-03 | 0.25 | a | <-- backfill with zeroes
| 2020-08-03 | 0.5 | b |
| 2020-08-10 | 0.5 | a |
| 2020-08-10 | 0.75 | b |
| 2020-08-17 | 1 | b |
| 2020-08-17 | 0.5 | c |
| 2020-08-27 | 1 | a |
| 2020-08-27 | 2.25 | b |
| 2020-08-27 | 0.5 | c | <-- has average values even with no value in week
| 2020-08-31 | 0.75 | a | <-- has average values even with no value in week
| 2020-08-31 | 1.75 | b | <-- has average values even with no value in week
| 2020-08-31 | 1.25 | c |
-------------------------------------
这是我使用联接的方法:
let Test = datatable (Week:datetime,Value:real,Dim1: string)
[datetime(2020-08-03),"c"];
let FullTable = Test
| summarize by Week
| extend A = 1
| join kind=fullouter (Test | summarize by Dim1 | extend A = 1) on A
| join kind=leftouter (Test) on Week,Dim1
| project-away Week1,Dim11,A,A1;
FullTable
| join kind=leftouter (FullTable | extend Week = Week + 7d) on Week,Dim1
| join kind=leftouter (FullTable | extend Week = Week + 14d) on Week,Dim1
| join kind=leftouter (FullTable | extend Week = Week + 21d) on Week,Dim1
| project Week,Dim1,Value0 = iff(isnull(Value),0.0,Value),Value1 = iff(isnull(Value1),Value1),Value2 = iff(isnull(Value2),Value2),Value3 = iff(isnull(Value3),Value3)
| extend Average = (Value0 + Value1 + Value2 + Value3)/4
| project-away Value0,Value1,Value2,Value3
它可以解决问题,但似乎应该有一种更好的方式。
解决方法
请参阅以下2条建议,这些建议受aggregations over sliding window的启发。想法是将每个值扩展到分析期(28d)的结束。
let _start = datetime(2020-08-03);
let _period = 28d;
let _end = _start + 28d;
let Test = datatable (Week:datetime,Value:real,Dim1: string)
[datetime(2020-08-03),1,"a",datetime(2020-08-03),2,"b",datetime(2020-08-10),datetime(2020-08-17),"c",datetime(2020-08-24),4,datetime(2020-08-31),3,"c"];
Test
| order by Dim1 asc,Week asc
| extend _bin = bin_at(Week,7d,_start)
| extend _endRange = iif(_bin + _period > _end,_end,iff( _bin + _period - 7d < _start,_start,iff( _bin + _period - 7d < _bin,_bin,_bin + _period - 7d)))
| extend _range = range(_bin,_endRange,7d)
| mv-expand _range to typeof(datetime)
| extend WeekNum = toint((_range - Week)/7d)
| project Week=_range,Dim1,Value,WeekNum=strcat("Value",WeekNum)
| evaluate pivot(WeekNum,sum(Value))
| project Week,Average = (Value0 + Value1 + Value2 + Value3)/4
|Week|Dim1|Average|
|---|---|---|
|2020-08-03 00:00:00.0000000|a|0.25|
|2020-08-03 00:00:00.0000000|b|0.5|
|2020-08-10 00:00:00.0000000|a|0.5|
|2020-08-10 00:00:00.0000000|b|0.75|
|2020-08-17 00:00:00.0000000|a|0.5|
|2020-08-17 00:00:00.0000000|b|1.25|
|2020-08-17 00:00:00.0000000|c|0.5|
|2020-08-24 00:00:00.0000000|a|1|
|2020-08-24 00:00:00.0000000|b|2.25|
|2020-08-24 00:00:00.0000000|c|0.5|
|2020-08-31 00:00:00.0000000|a|0.75|
|2020-08-31 00:00:00.0000000|b|1.75|
|2020-08-31 00:00:00.0000000|c|1.25|
选项2:
let _start = datetime(2020-08-03);
let _period = 28d;
let _end = _start + 28d;
let Test = datatable (Week:datetime,"c"];
let _dims = Test | distinct Dim1;
let _fullRange = range Week from _start to _end step 7d
| extend _start = max_of(-3,-((Week-_start)/7d))
| extend _range = range((_start),(_start+3),1) | mv-expand _range to typeof(int) | project Week,_origin = Week + _range*7d | extend K=1,Value=0.0 ;
let _fullRangeDims = _dims | extend K=1 | join kind=inner (_fullRange) on K | project-away K;
_fullRangeDims
| join kind=fullouter
(Test
| order by Dim1 asc,7d)
| mv-expand _range to typeof(datetime)
| project Week=_range,_origin = Week) on Week,_origin,Dim1
| project Week=coalesce(Week1,Week),Dim1=coalesce(Dim11,Dim1),Value=coalesce(Value1,Value),_origin= coalesce(_origin1,_origin)
| summarize avg(Value) by Week,Dim1
| order by Week asc,Dim1 asc
|Week|Dim1|avg_Value|
|---|---|---|
|2020-08-03 00:00:00.0000000|a|0.25|
|2020-08-03 00:00:00.0000000|b|0.5|
|2020-08-03 00:00:00.0000000|c|0|
|2020-08-10 00:00:00.0000000|a|0.5|
|2020-08-10 00:00:00.0000000|b|0.75|
|2020-08-10 00:00:00.0000000|c|0|
|2020-08-17 00:00:00.0000000|a|0.5|
|2020-08-17 00:00:00.0000000|b|1.25|
|2020-08-17 00:00:00.0000000|c|0.5|
|2020-08-24 00:00:00.0000000|a|1|
|2020-08-24 00:00:00.0000000|b|2.25|
|2020-08-24 00:00:00.0000000|c|0.5|
|2020-08-31 00:00:00.0000000|a|0.75|
|2020-08-31 00:00:00.0000000|b|1.75|
|2020-08-31 00:00:00.0000000|c|1.25|
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。