在Azure数据资源管理器中具有每周条目的表中,计算X周数的平均值,这是自联接之外的其他选项吗?

如何解决在Azure数据资源管理器中具有每周条目的表中,计算X周数的平均值,这是自联接之外的其他选项吗?

我有一张桌子,其中每一行都来自一周。同一周有多个行,但是基于几个维度,它们是唯一的。

| Week | Col1 | Col2 |
----------------------
|  W1  |  X1  |   a  |
|  W1  |  X2  |   b  |
|  W2  |  X3  |   a  |
.
... More rows

我想计算4周(或通常X周)内Col1的平均值。

我知道我可以通过将表与自身连接4次来做到这一点,但这似乎并不正确……还有更好的方法吗?

样本输入数据表:

datatable (Week:datetime,Value:decimal,Dim1:string)
    [datetime(2020-08-03),1,"a",datetime(2020-08-03),2,"b",datetime(2020-08-10),datetime(2020-08-17),"c",datetime(2020-08-24),4,datetime(2020-08-31),3,"c"]

我期望的结果是(在本示例中,我将最后一天用作平均值的“天”)。请注意,如果一周内没有任何价值,我会假设它是0。此外,如果某个维度出现在一周的某一周内,则该维度会包含在最终平均值中(不会发生这种情况,但为完整性起见会添加此数字):

| Week       | Average_Value | Dim1 |
-------------------------------------
| 2020-08-03 | 0.25          | a    | <-- backfill with zeroes
| 2020-08-03 | 0.5           | b    |
| 2020-08-10 | 0.5           | a    |
| 2020-08-10 | 0.75          | b    |
| 2020-08-17 | 1             | b    |
| 2020-08-17 | 0.5           | c    |
| 2020-08-27 | 1             | a    |
| 2020-08-27 | 2.25          | b    |
| 2020-08-27 | 0.5           | c    | <-- has average values even with no value in week
| 2020-08-31 | 0.75          | a    | <-- has average values even with no value in week
| 2020-08-31 | 1.75          | b    | <-- has average values even with no value in week
| 2020-08-31 | 1.25          | c    |
-------------------------------------

这是我使用联接的方法

let Test = datatable (Week:datetime,Value:real,Dim1: string)
    [datetime(2020-08-03),"c"];
let FullTable = Test
| summarize by Week
| extend A = 1
| join kind=fullouter (Test | summarize by Dim1 | extend A = 1) on A
| join kind=leftouter (Test) on Week,Dim1
| project-away Week1,Dim11,A,A1;
FullTable
| join kind=leftouter (FullTable | extend Week = Week + 7d) on Week,Dim1
| join kind=leftouter (FullTable | extend Week = Week + 14d) on Week,Dim1
| join kind=leftouter (FullTable | extend Week = Week + 21d) on Week,Dim1
| project Week,Dim1,Value0 = iff(isnull(Value),0.0,Value),Value1 = iff(isnull(Value1),Value1),Value2 = iff(isnull(Value2),Value2),Value3 = iff(isnull(Value3),Value3)
| extend Average = (Value0 + Value1 + Value2 + Value3)/4
| project-away Value0,Value1,Value2,Value3

它可以解决问题,但似乎应该有一种更好的方式。

解决方法

请参阅以下2条建议,这些建议受aggregations over sliding window的启发。想法是将每个值扩展到分析期(28d)的结束。

let _start = datetime(2020-08-03);
let _period = 28d;
let _end = _start + 28d; 
let Test = datatable (Week:datetime,Value:real,Dim1: string)
    [datetime(2020-08-03),1,"a",datetime(2020-08-03),2,"b",datetime(2020-08-10),datetime(2020-08-17),"c",datetime(2020-08-24),4,datetime(2020-08-31),3,"c"];
Test 
| order by Dim1 asc,Week asc 
| extend _bin = bin_at(Week,7d,_start) 
| extend _endRange = iif(_bin + _period > _end,_end,iff( _bin + _period - 7d < _start,_start,iff( _bin + _period - 7d < _bin,_bin,_bin + _period - 7d)))  
| extend _range = range(_bin,_endRange,7d) 
| mv-expand _range to typeof(datetime) 
| extend WeekNum = toint((_range - Week)/7d)
| project Week=_range,Dim1,Value,WeekNum=strcat("Value",WeekNum)
| evaluate pivot(WeekNum,sum(Value))
| project Week,Average = (Value0 + Value1 + Value2 + Value3)/4
|Week|Dim1|Average|
|---|---|---|
|2020-08-03 00:00:00.0000000|a|0.25|
|2020-08-03 00:00:00.0000000|b|0.5|
|2020-08-10 00:00:00.0000000|a|0.5|
|2020-08-10 00:00:00.0000000|b|0.75|
|2020-08-17 00:00:00.0000000|a|0.5|
|2020-08-17 00:00:00.0000000|b|1.25|
|2020-08-17 00:00:00.0000000|c|0.5|
|2020-08-24 00:00:00.0000000|a|1|
|2020-08-24 00:00:00.0000000|b|2.25|
|2020-08-24 00:00:00.0000000|c|0.5|
|2020-08-31 00:00:00.0000000|a|0.75|
|2020-08-31 00:00:00.0000000|b|1.75|
|2020-08-31 00:00:00.0000000|c|1.25|

选项2:

let _start = datetime(2020-08-03);
let _period = 28d;
let _end = _start + 28d; 
let Test = datatable (Week:datetime,"c"];
let _dims = Test | distinct Dim1;
let _fullRange = range Week from _start to _end step 7d 
 | extend _start = max_of(-3,-((Week-_start)/7d))
 | extend _range = range((_start),(_start+3),1) | mv-expand _range to typeof(int) | project Week,_origin = Week + _range*7d | extend K=1,Value=0.0 ;
let _fullRangeDims = _dims | extend K=1 | join kind=inner (_fullRange) on K | project-away K;
_fullRangeDims
| join kind=fullouter
(Test 
| order by Dim1 asc,7d) 
| mv-expand _range to typeof(datetime) 
| project Week=_range,_origin = Week) on Week,_origin,Dim1
| project Week=coalesce(Week1,Week),Dim1=coalesce(Dim11,Dim1),Value=coalesce(Value1,Value),_origin= coalesce(_origin1,_origin)
| summarize avg(Value) by Week,Dim1
| order by  Week asc,Dim1 asc
|Week|Dim1|avg_Value|
|---|---|---|
|2020-08-03 00:00:00.0000000|a|0.25|
|2020-08-03 00:00:00.0000000|b|0.5|
|2020-08-03 00:00:00.0000000|c|0|
|2020-08-10 00:00:00.0000000|a|0.5|
|2020-08-10 00:00:00.0000000|b|0.75|
|2020-08-10 00:00:00.0000000|c|0|
|2020-08-17 00:00:00.0000000|a|0.5|
|2020-08-17 00:00:00.0000000|b|1.25|
|2020-08-17 00:00:00.0000000|c|0.5|
|2020-08-24 00:00:00.0000000|a|1|
|2020-08-24 00:00:00.0000000|b|2.25|
|2020-08-24 00:00:00.0000000|c|0.5|
|2020-08-31 00:00:00.0000000|a|0.75|
|2020-08-31 00:00:00.0000000|b|1.75|
|2020-08-31 00:00:00.0000000|c|1.25|

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其他元素将获得点击?
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。)
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbcDriver发生异常。为什么?
这是用Java进行XML解析的最佳库。
Java的PriorityQueue的内置迭代器不会以任何特定顺序遍历数据结构。为什么?
如何在Java中聆听按键时移动图像。
Java“Program to an interface”。这是什么意思?
Java在半透明框架/面板/组件上重新绘画。
Java“ Class.forName()”和“ Class.forName()。newInstance()”之间有什么区别?
在此环境中不提供编译器。也许是在JRE而不是JDK上运行?
Java用相同的方法在一个类中实现两个接口。哪种接口方法被覆盖?
Java 什么是Runtime.getRuntime()。totalMemory()和freeMemory()?
java.library.path中的java.lang.UnsatisfiedLinkError否*****。dll
JavaFX“位置是必需的。” 即使在同一包装中
Java 导入两个具有相同名称的类。怎么处理?
Java 是否应该在HttpServletResponse.getOutputStream()/。getWriter()上调用.close()?
Java RegEx元字符(。)和普通点?