微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

多元相关过滤器

如何解决多元相关过滤器

如何识别两个分类特征与目标变量的关联之间的相关性。

例如:

如果三个特征包含 2 个分类变量和 1 个目标变量 在使用卡方检验确定每个特征与目标变量的相关性时,我无法找到强关系。所以我想使用这两个特征的组合来检查是否与目标变量存在相关性,但我很困惑对于这种情况我们是否可以使用卡方检验或其他一些方法

例如:

ct_reloc_status = pd.crosstab(df_offer_details['percentage_hike_offered_bin'].sample(frac=0.5,replace=True,random_state=1),[df_offer_details['Candidate relocation status'].sample(frac=0.5,df_offer_details['Acceptance status'].sample(frac=0.5,random_state=1)])
ct_reloc_status

# we carry out a contingency test to check whether there is a correlation with the target variable 
# and relocation status 
H0 = "There is no relationship between Relocation status and Acceptance status"
Ha = "There is a relationship between Relocation status and Acceptance status"

stat,p,dof,expected = chi2_contingency(ct_reloc_status)
print('p-value: ',p)

prob = 0.95
critical = chi2.ppf(prob,dof)
print('probability=%.3f,critical=%.3f,stat=%.3f' % (prob,critical,stat))

if abs(stat) >= critical :
    print(f'''Since p-value {p} < 0.05 we reject null hypothesis: {H0}.Thus alternate hypothesis: {Ha} holds good ''')
else:
    print(f'Fail to reject null hypothesis {H0}')

Result:


p-value:  0.019814129159194147
probability=0.950,critical=28.869,stat=32.380
Since p-value 0.019814129159194147 < 0.05 we reject the null hypothesis: There is no relationship between Relocation status and Acceptance status.Thus alternate hypothesis: There is a relationship between Relocation status and Acceptance status holds good 

但我不确定这是否是正确的方法

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。