微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

多变量回归statsmodels.api

如何解决多变量回归statsmodels.api

我已经浏览了文档,但仍然无法弄清楚。我想运行具有多个回归的WLS。

statsmodels.api作为sm导入

单个变量的示例。

X = Height
Y = Weight

res = sm.OLS(Y,X,).fit() 
res.summary()

说我也有:

X2 =年龄

如何将其添加到回归中?

解决方法

您可以将它们放入data.frame中并调出列(这样,输出也看起来更好):

import statsmodels.api as sm
import pandas as pd
import numpy as np

Height = np.random.uniform(0,1,100)
Weight = np.random.uniform(0,100)
Age = np.random.uniform(0,30,100)

df = pd.DataFrame({'Height':Height,'Weight':Weight,'Age':Age})

res = sm.OLS(df['Height'],df[['Weight','Age']]).fit()

In [10]: res.summary()
Out[10]: 
<class 'statsmodels.iolib.summary.Summary'>
"""
                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                 Height   R-squared (uncentered):                   0.700
Model:                            OLS   Adj. R-squared (uncentered):              0.694
Method:                 Least Squares   F-statistic:                              114.3
Date:                Mon,24 Aug 2020   Prob (F-statistic):                    2.43e-26
Time:                        15:54:30   Log-Likelihood:                         -28.374
No. Observations:                 100   AIC:                                      60.75
Df Residuals:                      98   BIC:                                      65.96
Df Model:                           2                                                  
Covariance Type:            nonrobust                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Weight         0.1787      0.090      1.988      0.050       0.000       0.357
Age            0.0229      0.003      8.235      0.000       0.017       0.028
==============================================================================
Omnibus:                        2.938   Durbin-Watson:                   1.813
Prob(Omnibus):                  0.230   Jarque-Bera (JB):                2.223
Skew:                          -0.211   Prob(JB):                        0.329
Kurtosis:                       2.404   Cond. No.                         49.7
==============================================================================
,

我使用二阶多项式来预测身高和年龄如何影响士兵的体重。你可以在我的 GitHub 上获取 ansur_2_m.csv。

 df=pd.read_csv('ANSUR_2_M.csv',encoding = "ISO-8859-1",usecols=['Weightlbs','Heightin','Age'],dtype={'Weightlbs':np.integer,'Heightin':np.integer,'Age':np.integer})
 df=df.dropna()
 df.reset_index()
 df['Heightin2']=df['Heightin']**2
 df['Age2']=df['Age']**2

 formula="Weightlbs ~ Heightin+Heightin2+Age+Age2"
 model_ols = smf.ols(formula,data=df).fit()
 minHeight=df['Heightin'].min()
 maxHeight=df['Heightin'].max()
 avgAge = df['Age'].median()
 print(minHeight,maxHeight,avgAge)

 df2=pd.DataFrame()

 df2['Heightin']=np.linspace(60,100,50)
 df2['Heightin2']=df2['Heightin']**2
 df2['Age']=28
 df2['Age2']=df['Age']**2

 df3=pd.DataFrame()
 df3['Heightin']=np.linspace(60,50)
 df3['Heightin2']=df2['Heightin']**2
 df3['Age']=45
 df3['Age2']=df['Age']**2

 prediction28=model_ols.predict(df2)
 prediction45=model_ols.predict(df3)

 plt.clf()
 plt.plot(df2['Heightin'],prediction28,label="Age 28")
 plt.plot(df3['Heightin'],prediction45,label="Age 45")
 plt.ylabel="Weight lbs"
 plt.xlabel="Height in"
 plt.legend()
 plt.show()

 print('A 45 year old soldier is more probable to weight more than an 28 year old soldier')

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。