如何解决在Python的时间范围内重新采样
我想通过df
到0
的时间范围内为所有列创建每月数据并用2019-01-01
填充缺失值来对2019-12-31
进行重新采样。
df
:
ITEM_ID Date Value YearMonth
0 101002 2019-03-31 1.0 2019-03
1 101002 2019-04-30 1.0 2019-04
2 101002 2019-10-31 0.0 2019-10
3 101002 2019-11-30 8.0 2019-11
4 101002 2019-12-31 5.0 2019-12
预期输出:
ITEM_ID Date Value YearMonth
... 0 2019-01 (added)
... 0 2019-02 (added)
0 101002 2019-03-31 1.0 2019-03
1 101002 2019-04-30 1.0 2019-04
... 0 2019-05 (added)
... 0 2019-06 (added)
... 0 2019-07 (added)
... 0 2019-08 (added)
... 0 2019-09 (added)
2 101002 2019-10-31 0.0 2019-10
3 101002 2019-11-30 8.0 2019-11
4 101002 2019-12-31 5.0 2019-12
我遇到了几种方法,例如multiindex
和resample
。 multiindex
似乎用途广泛,但是在涉及不同级别的索引时会变得有些复杂。我不确定resample
是否允许我将效果扩展到指定的时间范围。最好的方法是什么?
解决方法
这是解决方案
import pandas as pd
df1= # this is the dataframe which you have given example. please change accordingly.
print(df1)
data=[['2019-01'],['2019-02'],['2019-03'],['2019-04'],['2019-05'],['2019-06'],['2019-07'],['2019-08'],['2019-09'],['2019-10'],['2019-11'],['2019-12']]
df2=pd.DataFrame(data=data,columns=['YearMonth'])
print(df2)
final_DF = pd.merge(df1,df2,on ='YearMonth',how ='outer').sort_values('YearMonth')
final_DF = final_DF.fillna(0)
print(final_DF)
,
我们没有考虑年和月列,而是创建了一个带有开始和结束日期和时间的空数据框,并将其与原始数据框组合在一起。
df['Date'] = pd.to_datetime(df['Date'])
df1 = pd.DataFrame(index=pd.to_datetime(pd.date_range('2019-01-01','2020-01-01',freq='1M'))).reset_index()
df1 = df1.merge(df,left_on='index',right_on='Date',how='outer')
df1['yearmonth'] = df1['index'].apply(lambda x: str(x.year) + '-' + '{:02}'.format(x.month))
df1
index ITEM_ID Date Value YearMonth yearmonth
0 2019-01-31 NaN NaT NaN NaN 2019-01
1 2019-02-28 NaN NaT NaN NaN 2019-02
2 2019-03-31 101002.0 2019-03-31 1.0 2019-03 2019-03
3 2019-04-30 101002.0 2019-04-30 1.0 2019-04 2019-04
4 2019-05-31 NaN NaT NaN NaN 2019-05
5 2019-06-30 NaN NaT NaN NaN 2019-06
6 2019-07-31 NaN NaT NaN NaN 2019-07
7 2019-08-31 NaN NaT NaN NaN 2019-08
8 2019-09-30 NaN NaT NaN NaN 2019-09
9 2019-10-31 101002.0 2019-10-31 0.0 2019-10 2019-10
10 2019-11-30 101002.0 2019-11-30 8.0 2019-11 2019-11
11 2019-12-31 101002.0 2019-12-31 5.0 2019-12 2019-12
,
我认为您需要DataFrame.reindex
:
df['YearMonth'] = pd.to_datetime(df['YearMonth'])
r = pd.to_datetime(pd.date_range('2019-01-01',freq='1MS'))
mux = pd.MultiIndex.from_product([df['ITEM_ID'].unique(),r],names=['ITEM_ID','YearMonth'])
df = df.set_index(['ITEM_ID','YearMonth']).reindex(mux).fillna({'Value':0}).reset_index().reindex(df.columns,axis=1)
print (df)
ITEM_ID Date Value YearMonth
0 101002 NaN 0.0 2019-01-01
1 101002 NaN 0.0 2019-02-01
2 101002 2019-03-31 1.0 2019-03-01
3 101002 2019-04-30 1.0 2019-04-01
4 101002 NaN 0.0 2019-05-01
5 101002 NaN 0.0 2019-06-01
6 101002 NaN 0.0 2019-07-01
7 101002 NaN 0.0 2019-08-01
8 101002 NaN 0.0 2019-09-01
9 101002 2019-10-31 0.0 2019-10-01
10 101002 2019-11-30 8.0 2019-11-01
11 101002 2019-12-31 5.0 2019-12-01
12 101002 NaN 0.0 2020-01-01
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。