微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

Python向前逐步回归'不在索引中'

如何解决Python向前逐步回归'不在索引中'

我正在使用一些关于波士顿住房数据的教程,借助几个在线的逐步示例。我不断收到一个错误,其中一个变量不在索引中。

import statsmodels.api as sm
import pandas as  pd
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_boston
boston_dataset = load_boston()

#create dataframe from boston
X = pd.DataFrame(boston_dataset.data,columns = boston_dataset.feature_names)
y = boston_dataset.target


#split data into training and test sets
X_train,X_test,Y_train,Y_test = train_test_split(X,y,test_size = 0.2,random_state=5)

这是从 this 网站使用的回归循环,还有一段几乎相同的代码 here

def forward_regression(X,initial_list=[],threshold_in=0.01,threshold_out = 0.05,verbose=True):
    initial_list = []
    included = list(initial_list)
    while True:
        changed=False
        # forward step
        excluded = list(set(X.columns)-set(included))
        new_pval = pd.Series(index=excluded)
        for new_column in excluded:
            model = sm.OLS(y,sm.add_constant(pd.DataFrame(X[included+[new_column]]))).fit()
            new_pval[new_column] = model.pvalues[new_column]
        best_pval = new_pval.min()
        if best_pval < threshold_in:
            best_feature = new_pval.argmin()
            included.append(best_feature)
            changed=True
            if verbose:
                print('Add   with p-value '.format(best_feature,best_pval))

        if not changed:
            break

    return included

一旦我跑了 forward_regression (X_train,Y_train),我收到以下错误

enter image description here

感谢任何建议!

解决方法

您需要使用 idxmin() 代替 argmin()。后者返回整数位置,而 idxmin() 将返回标签。

固定函数是

def forward_regression(X,y,initial_list=[],threshold_in=0.01,threshold_out = 0.05,verbose=True):
    initial_list = []
    included = list(initial_list)
    while True:
        changed=False
        # forward step
        excluded = list(set(X.columns)-set(included))
        new_pval = pd.Series(index=excluded)
        for new_column in excluded:
            model = sm.OLS(y,sm.add_constant(pd.DataFrame(X[included+[new_column]]))).fit()
            new_pval[new_column] = model.pvalues[new_column]
        best_pval = new_pval.min()
        if best_pval < threshold_in:
            # Change argmin -> idxmin
            best_feature = new_pval.idxmin()
            included.append(best_feature)
            changed=True
            if verbose:
                print('Add   with p-value '.format(best_feature,best_pval))

        if not changed:
            break

    return included

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。