Pandas 将月度数据重新采样为自定义频率季节性数据

如何解决Pandas 将月度数据重新采样为自定义频率季节性数据

背景

我有一个月度数据集，想通过添加月度数据将其重新采样为季节性数据。

Seasonal refers to:
(Dec,Jan,Feb),(Mar,Apr,May),(June,July,Aug,Sep),(Oct,Nov)

数据

dti = pd.date_range("2015-12-31",periods=11,freq="M")
df = pd.DataFrame({'time':dti,'data':np.random.rand(len(dti))})

Output:
        time    data
0   2015-12-31  0.466245
1   2016-01-31  0.959309
2   2016-02-29  0.445139
3   2016-03-31  0.575556
4   2016-04-30  0.303020
5   2016-05-31  0.591516
6   2016-06-30  0.001410
7   2016-07-31  0.338360
8   2016-08-31  0.540705
9   2016-09-30  0.115278
10  2016-10-31  0.950359

代码

因此，除了 12 月、1 月和 2 月 (DJF) 之外，我还可以为其他季节重新采样。这是我在其他季节所做的：

MAM = df.loc[df['time'].dt.month.between(3,5)].resample('Y',on='time').sum()

因为对于 DJF 我不能使用 between，所以我使用了条件语句。

mask = (df['time'].dt.month>11) | (df['time'].dt.month<=2)
DJF = df.loc[mask].resample('3M',origin='start',on='time').sum()

问题

即使我使用了 origin = 'start'，此重采样仍保留我的第一个数据“2015-12-31”并从“2016”开始。所以，我的问题基本上是：

如何解决重采样问题？
我觉得必须有一种更直接、更简单的方法来做到这一点，而不是条件语句。另外，是否有类似于使用 df['time'].month.between 但用于 index.html 的东西？我尝试使用 df.index.month.between 但在 int64 日期时间对象之间不起作用。我发现重复使用 df.set_index 和 df.reset_index 很烦人。

解决方法

尝试将每个月的值映射到一个季节值，然后在每个季节 groupby resample：

df['season'] = df['time'].dt.month.map({
    12: 0,1: 0,2: 0,3: 1,4: 1,5: 1,6: 2,7: 2,8: 2,9: 2,10: 3,11: 3
})

df = df.groupby('season').resample('Y',on='time')['data'].sum().reset_index()

df：

   season       time      data
0       0 2015-12-31  0.221993
1       0 2016-12-31  1.077451
2       1 2016-12-31  2.018766
3       2 2016-12-31  1.768848
4       3 2016-12-31  0.080741

要将前一个 12 月视为下一年的一部分，从 pandas.tseries.offsets 添加 MonthBegin 以抵消 2015 年 12 月到 2016 年 1 月，然后将所有季节值向前调整一个月：

df['time'] = df['time'] + MonthBegin(1)
df['season'] = df['time'].dt.month.map({
    1: 0,3: 0,6: 1,10: 2,11: 3,12: 3
})

df = df.groupby('season').resample('Y',on='time')['data'].sum().reset_index()

df：

   season       time      data
0       0 2016-12-31  1.299445
1       1 2016-12-31  2.018766
2       2 2016-12-31  1.768848
3       3 2016-12-31  0.080741

使用的样本数据：

np.random.seed(5)
dti = pd.date_range("2015-12-31",periods=11,freq="M")
df = pd.DataFrame({'time': dti,'data': np.random.rand(len(dti))})

df：

         time      data
0  2015-12-31  0.221993
1  2016-01-31  0.870732
2  2016-02-29  0.206719
3  2016-03-31  0.918611
4  2016-04-30  0.488411
5  2016-05-31  0.611744
6  2016-06-30  0.765908
7  2016-07-31  0.518418
8  2016-08-31  0.296801
9  2016-09-30  0.187721
10 2016-10-31  0.080741

Pandas 将月度数据重新采样为自定义频率季节性数据

如何解决Pandas 将月度数据重新采样为自定义频率季节性数据

背景

数据

代码

问题

解决方法

相关推荐