如何解决按月分组获取数据集的百分位数
ew。这是一个真正的脑筋急转弯。首先,我用于测试的表架构为:
Create Table scores
(
Id int not null identity(1,1) primary key clustered
, [Date] datetime not null
, score int not null
)
现在,首先,我在sql 2008中使用CTE计算值以检查答案,然后构建了一个可以在sql 2000中使用的解决方案。因此,在sql 2008中,我们执行以下操作:
;With
SummaryStatistics As
(
Select Year([Date]) As YearNum
, Month([Date]) As MonthNum
, Min(score) As Minscore
, Max(score) As Maxscore
, Avg(score) As Avgscore
From scores
Group By Month([Date]), Year([Date])
)
, percentiles As
(
Select Year([Date]) As YearNum
, Month([Date]) As MonthNum
, score
, NTile( 100 ) Over ( Partition By Month([Date]), Year([Date]) Order By score ) As percentile
From scores
)
, Reportedpercentiles As
(
Select YearNum, MonthNum
, Min(Case When percentile = 45 Then score End) As percentile45
, Min(Case When percentile = 55 Then score End) As percentile55
From percentiles
Where percentile In(45,55)
Group By YearNum, MonthNum
)
Select SS.YearNum, SS.MonthNum
, SS.Minscore, SS.Maxscore, SS.Avgscore
, RP.percentile45, RP.percentile55
From SummaryStatistics As SS
Join Reportedpercentiles As RP
On RP.YearNum = SS.YearNum
And RP.MonthNum = SS.MonthNum
Order By SS.YearNum, SS.MonthNum
现在是一个sql 2000解决方案。本质上,诀窍是使用几个临时表来计算分数的出现。
If object_id('tempdb..#Working') is not null
DROP TABLE #Working
GO
Create Table #Working
(
YearNum int not null
, MonthNum int not null
, score int not null
, Occurances int not null
, Constraint PK_#Working Primary Key Clustered ( MonthNum, YearNum, score )
)
GO
Insert #Working(MonthNum, YearNum, score, Occurances)
Select Month([Date]), Year([Date]), score, Count(*)
From scores
Group By Month([Date]), Year([Date]), score
GO
If object_id('tempdb..#SummaryStatistics') is not null
DROP TABLE #SummaryStatistics
GO
Create Table #SummaryStatistics
(
MonthNum int not null
, YearNum int not null
, score int not null
, Occurances int not null
, Cumulativetotal int not null
, percentile float null
, Constraint PK_#SummaryStatistics Primary Key Clustered ( MonthNum, YearNum, score )
)
GO
Insert #SummaryStatistics(YearNum, MonthNum, score, Occurances, Cumulativetotal)
Select W2.YearNum, W2.MonthNum, W2.score, W2.Occurances, Sum(W1.Occurances)-W2.Occurances
From #Working As W1
Join #Working As W2
On W2.YearNum = W1.YearNum
And W2.MonthNum = W1.MonthNum
Where W1.score <= W2.score
Group By W2.YearNum, W2.MonthNum, W2.score, W2.Occurances
Update #SummaryStatistics
Set percentile = SS.Cumulativetotal * 100.0 / MonthTotal.Total
From #SummaryStatistics As SS
Join (
Select SS1.YearNum, SS1.MonthNum, Max(SS1.Cumulativetotal) As Total
From #SummaryStatistics As SS1
Group By SS1.YearNum, SS1.MonthNum
) As MonthTotal
On MonthTotal.YearNum = SS.YearNum
And MonthTotal.MonthNum = SS.MonthNum
Select GeneralStats.*, percentiles.percentile45, percentiles.percentile55
From (
Select Year(S1.[Date]) As YearNum
, Month(S1.[Date]) As MonthNum
, Min(S1.score) As Minscore
, Max(S1.score) As Maxscore
, Avg(S1.score) As Avgscore
From scores As S1
Group By Month(S1.[Date]), Year(S1.[Date])
) As GeneralStats
Join (
Select SS1.YearNum, SS1.MonthNum
, Min(Case When SS1.percentile >= 45 Then score End) As percentile45
, Min(Case When SS1.percentile >= 55 Then score End) As percentile55
From #SummaryStatistics As SS1
Group By SS1.YearNum, SS1.MonthNum
) As percentiles
On percentiles.YearNum = GeneralStats.YearNum
And percentiles.MonthNum = GeneralStats.MonthNum
解决方法
我有一个带有全部记录的SQL表,如下所示:
| Date | Score |
+ -----------+-------+
| 01/01/2010 | 4 |
| 02/01/2010 | 6 |
| 03/01/2010 | 10 |
...
| 16/03/2010 | 2 |
我将其绘制在图表上,因此我在图表上得到一条漂亮的线,表示随时间推移的得分。迷人的。
现在,我需要做的是在图表上包括平均得分,以便我们可以看到平均得分随时间的变化,因此我可以简单地将其添加到组合中:
SELECT
YEAR(SCOREDATE) 'Year',MONTH(SCOREDATE) 'Month',MIN(SCORE) MinScore,AVG(SCORE) AverageScore,MAX(SCORE) MaxScore
FROM SCORES
GROUP BY YEAR(SCOREDATE),MONTH(SCOREDATE)
ORDER BY YEAR(SCOREDATE),MONTH(SCOREDATE)
到目前为止没有问题。
问题是,如何轻松计算每个时间段的百分位数?我不确定这是正确的短语。我总共需要的是:
- 图表上的分数线(简单)
- 图表上的平均值线(简单)
- 图表上的一条线显示了95%的分数占据的频段(已绊脚步)
这是我不明白的第三个。我需要计算5%的百分位数,我可以单独这样做:
SELECT MAX(SubQ.SCORE) FROM
(SELECT TOP 45 PERCENT SCORE
FROM SCORES
WHERE YEAR(SCOREDATE) = 2010 AND MONTH(SCOREDATE) = 1
ORDER BY SCORE ASC) AS SubQ
SELECT MIN(SubQ.SCORE) FROM
(SELECT TOP 45 PERCENT SCORE
FROM SCORES
WHERE YEAR(SCOREDATE) = 2010 AND MONTH(SCOREDATE) = 1
ORDER BY SCORE DESC) AS SubQ
但是我无法弄清楚如何获得所有月份的表格。
| Date | Average | 45% | 55% |
+ -----------+---------+-----+-----+
| 01/01/2010 | 13 | 11 | 15 |
| 02/01/2010 | 10 | 8 | 12 |
| 03/01/2010 | 5 | 4 | 10 |
...
| 16/03/2010 | 7 | 7 | 9 |
此刻,我将不得不把这些加载到我的应用程序中,然后自己计算这些数字。或运行大量的单个查询并整理结果。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。