如何解决使用 scipy 进行 T 测试返回 NaN 错误代码,即使不存在 NaN 值
我正在编写一个代码,该代码可以根据不同性别的身高计算 excel 电子表格中数据的 t 值和 p 值(1 和 0 用于区分 Excel 工作表中的性别)。我还需要使用不同的高度范围,所以是前 10 个高度,然后是前 20 个高度,最后是前 30 个高度。然而,它总是返回“nan”而不是数字,即使我写“nan_policy='omit'”。我知道这里有一个用户和我有同样的问题,但是,他使用的是熊猫,这我不是。我正在使用 spyder4 和最新版本的 Anaconda。我也在使用 scipy 的 python 3.8.3 版和 1.5.0 版。代码如下:
import numpy as np
import scipy.stats
array = np.loadtxt(r'C:\filepath\Body-Data.csv',skiprows = 1,delimiter=',' )
slice10 = slice(0,10)
slice20 = slice(0,20)
slice30 = slice(0,30)
men_height = []
women_height = []
for i in range(8239):
if array[i,0] == 0:
women_height.append(array[i,2])
elif array[i,0] == 1:
men_height.append(array[i,2])
w_height10 = women_height[slice10]
w_height20 = women_height[slice20]
w_height30 = women_height[slice30]
m_height10 = men_height[slice10]
m_height20 = men_height[slice20]
m_height30 = men_height[slice30]
w_mean10 = np.mean(w_height10)
w_mean20 = np.mean(w_height20)
w_mean30 = np.mean(w_height30)
m_mean10 = np.mean(m_height10)
m_mean20 = np.mean(m_height20)
m_mean30 = np.mean(m_height30)
t_statistic1,p_value1 = scipy.stats.ttest_ind(m_mean10,w_mean10,nan_policy='omit')
print("this is the t-statistic for the first 10 heights of women and men: \n",t_statistic1)
print("this is the p-value for the first 10 heights of women and men: \n",p_value1)
t_statistic2,p_value2 = scipy.stats.ttest_ind(m_mean20,w_mean20,nan_policy='omit')
print("this is the t-statistic for the first 20 heights of women and men: \n",t_statistic2)
print("this is the p-value for the first 20 heights of women and men: \n",p_value2)
t_statistic3,p_value3 = scipy.stats.ttest_ind(m_mean30,w_mean30,nan_policy='omit')
print("this is the t-statistic for the first 30 heights of women and men: \n",t_statistic3)
print("this is the p-value for the first 30 heights of women and men: \n",p_value3)
我的输出是:
this is the t-statistic for the first 10 heights of women and men:
nan
this is the p-value for the first 10 heights of women and men:
nan
this is the t-statistic for the first 20 heights of women and men:
nan
this is the p-value for the first 20 heights of women and men:
nan
this is the t-statistic for the first 30 heights of women and men:
nan
this is the p-value for the first 30 heights of women and men:
nan
解决方法
scipy.stats.ttest_ind
的前两个参数必须是要比较的数据集,而不是数据集的均值。改变这一行
t_statistic1,p_value1 = scipy.stats.ttest_ind(m_mean10,w_mean10,nan_policy='omit')
到
t_statistic1,p_value1 = scipy.stats.ttest_ind(m_height10,w_height10,nan_policy='omit')
(如果输入中没有 nan
,您可以删除参数 nan_policy='omit'
。)
有关计算 t 统计量的其他变体,请参阅 Perform 2 sample t-test
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。