如何解决分组依据和过滤器数据集
df fruit year price vol signifiance
0 apple 2010 1 5
1 apple 2011 2 4
2 apple 2012 3 3
3 apple 2013 3 3
4 apple 2014 3 3
5 apple 2015 3 3 important
...
47 banana 2010 1 4
如果一个水果年很重要,我想使用该重要水果年之前和之后5年的数据对价格进行回归。
例如苹果的价格从2010年到2020年的价格回归。
我尝试过:
df = df.groupby('significance')
Y = df['price']
X = df['vol']
model = sm.OLS(Y,X)
解决方法
我相信您需要:
import statsmodels.api as sm
g = df.groupby('fruit')
for group in g.groups.keys():
df1 = g.get_group(group)
#filter years with important rows
years = df1.loc[df1['signifiance'].eq('important'),'year']
print (years)
#for each year get get years between 5 previous and 5 next years
for year in years:
data = df1[df1['year'].between(year - 5,year + 5)]
print (data)
#if returned data processing
if not data.empty:
X = data['vol']
Y = data['price']
model = sm.OLS(Y,X)
results = model.fit()
print (results.summary())
编辑:
import statsmodels.api as sm
def f(df1):
m1 = df1['signifiance'].eq('important')
years = df1.loc[m1,'year']
print (years)
#for each year get get years between 5 previous and 5 next years
for year in years:
mask = df1['year'].between(year - 5,year + 5) & df1['vol'].notna() & df1['price'].notna()
data = df1[mask]
# print (data)
#if returned data processing
if not data.empty:
X = data['vol']
Y = data['price']
model = sm.OLS(Y,X)
results = model.fit()
# print (results.params)
df1.loc[mask & m1,'new'] = results.params.iat[0]
return df1
df = df.groupby('fruit').apply(f)
print (df)
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。