如何解决在 Panda 多列事件数据中计算和排名 5 天移动平均线
我需要使用多个表征“事件”的 Panda 列来查找每小时数据的最高到最低的 5 天平均值并对其进行排名。我在 df 数据框中的数据如下所示。结果将显示连续 5 天平均正风、最低温度和最高相对湿度的连续天“事件”。我不确定如何计算连续 5 天的平均值,然后如何按多个条件进行排名。谢谢!
site year month day wind temp rh
0 A 1991 1 1 5.3 2.1 80.4
1 A 1991 1 2 12.6 -1.4 85.0
2 A 1991 1 3 14.7 -2.6 95.1
3 A 1991 1 4 11.8 4.8 57.3
4 A 1991 1 5 5.2 2.9 45.9
5 A 1991 1 6 3.9 4.3 52.1
6 A 1991 1 7 2.6 5.8 34.7
7 A 1991 1 8 2.9 5.7 29.2
8 A 1991 1 9 10.4 1.4 69.4
9 A 1991 1 10 14.6 -0.9 72.1
10 A 1991 1 11 13.9 -1.6 84.6
11 A 1991 1 12 14.5 -5.1 87.2
12 A 1991 1 13 12.8 -6.7 80.9
13 A 1991 1 14 8.4 -4.3 54.3
14 A 1991 1 15 5.7 0.7 44.8
我尝试使用如下滚动平均值的不同选项,但出现“列表分配索引超出范围”错误:
df['rolling_wind','rolling_t','rolling_rh'] = df.groupby(['wind','temp','rh']).rolling(window=5).mean()
5 天滚动平均值应如下所示:
site year month day wind temp rh
0 A 1991 1 1 n/a n/a n/a
1 A 1991 1 2 n/a n/a n/a
2 A 1991 1 3 n/a n/a n/a
3 A 1991 1 4 n/a n/a n/a
4 A 1991 1 5 9.92 1.16 72.74
5 A 1991 1 6 9.64 1.6 67.08
6 A 1991 1 7 7.64 3.04 57.02
7 A 1991 1 8 5.28 4.7 43.84
8 A 1991 1 9 5 4.02 46.26
9 A 1991 1 10 6.88 3.26 51.5
10 A 1991 1 11 8.88 2.08 58
11 A 1991 1 12 11.26 -0.1 68.5
12 A 1991 1 13 13.24 -2.58 78.84
13 A 1991 1 14 12.84 -3.72 75.82
14 A 1991 1 15 11.06 -3.4 70.36
而且,最终的输出应该是这样的,按风、温度、rh 的顺序排列优先级:
site year month day wind temp rh
0 A 1991 1 1 n/a n/a n/a
1 A 1991 1 2 n/a n/a n/a
2 A 1991 1 3 n/a n/a n/a
3 A 1991 1 4 n/a n/a n/a
12 A 1991 1 13 13.24 -2.58 78.84
13 A 1991 1 14 12.84 -3.72 75.82
11 A 1991 1 12 11.26 -0.1 68.5
14 A 1991 1 15 11.06 -3.4 70.36
4 A 1991 1 5 9.92 1.16 72.74
5 A 1991 1 6 9.64 1.6 67.08
10 A 1991 1 11 8.88 2.08 58
6 A 1991 1 7 7.64 3.04 57.02
9 A 1991 1 10 6.88 3.26 51.5
7 A 1991 1 8 5.28 4.7 43.84
8 A 1991 1 9 5 4.02 46.26
解决方法
尝试 rolling mean + sort_values,na_position 为第一:
import pandas as pd
d = {'site': {0: 'A',1: 'A',2: 'A',3: 'A',4: 'A',5: 'A',6: 'A',7: 'A',8: 'A',9: 'A',10: 'A',11: 'A',12: 'A',13: 'A',14: 'A'},'year': {0: 1991,1: 1991,2: 1991,3: 1991,4: 1991,5: 1991,6: 1991,7: 1991,8: 1991,9: 1991,10: 1991,11: 1991,12: 1991,13: 1991,14: 1991},'month': {0: 1,1: 1,2: 1,3: 1,4: 1,5: 1,6: 1,7: 1,8: 1,9: 1,10: 1,11: 1,12: 1,13: 1,14: 1},'day': {0: 1,1: 2,2: 3,3: 4,4: 5,5: 6,6: 7,7: 8,8: 9,9: 10,10: 11,11: 12,12: 13,13: 14,14: 15},'wind': {0: 5.3,1: 12.6,2: 14.7,3: 11.8,4: 5.2,5: 3.9,6: 2.6,7: 2.9,8: 10.4,9: 14.6,10: 13.9,11: 14.5,12: 12.8,13: 8.4,14: 5.7},'temp': {0: 2.1,1: -1.4,2: -2.6,3: 4.8,4: 2.9,5: 4.3,6: 5.8,7: 5.7,8: 1.4,9: -0.9,10: -1.6,11: -5.1,12: -6.7,13: -4.3,14: 0.7},'rh': {0: 80.4,1: 85.0,2: 95.1,3: 57.3,4: 45.9,5: 52.1,6: 34.7,7: 29.2,8: 69.4,9: 72.1,10: 84.6,11: 87.2,12: 80.9,13: 54.3,14: 44.8}}
df = pd.DataFrame(data=d)
cols = ['wind','temp','rh']
df[cols] = df[cols].rolling(window=5).mean()
df = df.sort_values(cols,ascending=False,na_position='first')
print(df)
df
:
site year month day wind temp rh
0 A 1991 1 1 NaN NaN NaN
1 A 1991 1 2 NaN NaN NaN
2 A 1991 1 3 NaN NaN NaN
3 A 1991 1 4 NaN NaN NaN
12 A 1991 1 13 13.24 -2.58 78.84
13 A 1991 1 14 12.84 -3.72 75.82
11 A 1991 1 12 11.26 -0.10 68.50
14 A 1991 1 15 11.06 -3.40 70.36
4 A 1991 1 5 9.92 1.16 72.74
5 A 1991 1 6 9.64 1.60 67.08
10 A 1991 1 11 8.88 2.08 58.00
6 A 1991 1 7 7.64 3.04 57.02
9 A 1991 1 10 6.88 3.26 51.50
7 A 1991 1 8 5.28 4.70 43.84
8 A 1991 1 9 5.00 4.02 46.26
,
import pandas as pd
df = pd.read_csv(##csv_name##)
#print(df)
print("wind:",df["wind"].rolling(5).mean().round(1))
print("temp:",df["temp"].rolling(5).mean().round(1))
print("rh:",df["rh"].rolling(5).mean().round(1))
试试这个。谢谢!
,>>> df[['wind','rh']].rolling(5).mean().sort_values(['wind','rh'],ascending=False)
wind temp rh
12 13.24 -2.58 78.84
13 12.84 -3.72 75.82
11 11.26 -0.10 68.50
14 11.06 -3.40 70.36
4 9.92 1.16 72.74
5 9.64 1.60 67.08
10 8.88 2.08 58.00
6 7.64 3.04 57.02
9 6.88 3.26 51.50
7 5.28 4.70 43.84
8 5.00 4.02 46.26
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。