如何解决泰坦尼克号数据集Corr PointBiserial 与 LogisticReg结果不同
我想检查logistic reg之间的相关结果。和 pointBiserial 为续。列,但结果不同。
比如
## CORRELATION CHECK NUMBERIC INPUTS VS BINARY (0/1) CATEGORICAL OUTPUT (pointbiserialr)
#The point-biserial correlation correlates a binary variable Y and a continuous variable X.
from scipy.stats import *
column_names = ["col_name","correlation","p_value"]
corr_numeric = pd.DataFrame(columns = column_names) # create dataframe for holding corr and p_values for each column
list1 = df_train['Survived']
for col in df_train[con_input_Col_list].columns:
list2=df_train[col]
# Apply the PointbiserialrResult()
corr_numeric.loc[-1] = [col,stats.pointbiserialr(list1,list2)[0],list2)[1]]
corr_numeric.index = corr_numeric.index + 1
#print the columns which reject null hypo
print(corr_numeric)
print("-------------------------")
print(corr_numeric.query("correlation>0.25 or correlation<-0.25"))
RESULT:
col_name correlation p_value
0 fare 0.257307 6.120189e-15
FOR LOG.REG. Corr.---------------------------------
## CORRELATION CHECK NUMBERIC INPUTS VS BINARY (0/1) CATEGORICAL OUTPUT (Log Reg.)
from sklearn.linear_model import LogisticRegression
import statsmodels.api as sm
from sklearn.model_selection import train_test_split
# Defining X and y
y=df_train['Survived']
X=df_train[con_input_Col_list]
# Adding the column of ones so it can provide intercept.
X=sm.add_constant(X)
#Splitting the dataset
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=0)
model=sm.Logit(y_train,X_train,random_state=0)
result=model.fit()
#print(result.summary())
# Fetching the statistics
stat_df=pd.DataFrame({'coefficients':result.params,'p-value': result.pvalues,'odds_ratio': np.exp(result.params)})
display(stat_df)
# Condition for significant parameters
significant_params=stat_df[stat_df['p-value']<=0.05].index
significant_params= significant_params.drop('const')
significant_params
Index(['Age','SibSp','fare'],dtype='object')
虽然 pointBiserial 仅显示具有良好相关性的票价列。分数,Log.Reg。 corr 说 'Age'、'SibSp'、'fare' 都可以。为什么?
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。