使用 scipy 进行 T 测试返回 NaN 错误代码，即使不存在 NaN 值

如何解决使用 scipy 进行 T 测试返回 NaN 错误代码，即使不存在 NaN 值

我正在编写一个代码，该代码可以根据不同性别的身高计算 excel 电子表格中数据的 t 值和 p 值（1 和 0 用于区分 Excel 工作表中的性别）。我还需要使用不同的高度范围，所以是前 10 个高度，然后是前 20 个高度，最后是前 30 个高度。然而，它总是返回“nan”而不是数字，即使我写“nan_policy='omit'”。我知道这里有一个用户和我有同样的问题，但是，他使用的是熊猫，这我不是。我正在使用 spyder4 和最新版本的 Anaconda。我也在使用 scipy 的 python 3.8.3 版和 1.5.0 版。代码如下：

import numpy as np
import scipy.stats

array = np.loadtxt(r'C:\filepath\Body-Data.csv',skiprows = 1,delimiter=',' )

slice10 = slice(0,10)
slice20 = slice(0,20)
slice30 = slice(0,30)

men_height = []
women_height = []

for i in range(8239):
    if array[i,0] == 0:
        women_height.append(array[i,2])
    elif array[i,0] == 1:
        men_height.append(array[i,2])
        
w_height10 = women_height[slice10]
w_height20 = women_height[slice20]
w_height30 = women_height[slice30]

m_height10 = men_height[slice10]
m_height20 = men_height[slice20]
m_height30 = men_height[slice30]

w_mean10 = np.mean(w_height10) 
w_mean20 = np.mean(w_height20) 
w_mean30 = np.mean(w_height30) 
    
m_mean10 = np.mean(m_height10)
m_mean20 = np.mean(m_height20)
m_mean30 = np.mean(m_height30)

t_statistic1,p_value1 = scipy.stats.ttest_ind(m_mean10,w_mean10,nan_policy='omit')
print("this is the t-statistic for the first 10 heights of women and men: \n",t_statistic1)
print("this is the p-value for the first 10 heights of women and men: \n",p_value1)


t_statistic2,p_value2 = scipy.stats.ttest_ind(m_mean20,w_mean20,nan_policy='omit')
print("this is the t-statistic for the first 20 heights of women and men: \n",t_statistic2)
print("this is the p-value for the first 20 heights of women and men: \n",p_value2)


t_statistic3,p_value3 = scipy.stats.ttest_ind(m_mean30,w_mean30,nan_policy='omit')
print("this is the t-statistic for the first 30 heights of women and men: \n",t_statistic3)
print("this is the p-value for the first 30 heights of women and men: \n",p_value3)

我的输出是：

this is the t-statistic for the first 10 heights of women and men: 
 nan

this is the p-value for the first 10 heights of women and men: 
 nan

this is the t-statistic for the first 20 heights of women and men: 
 nan

this is the p-value for the first 20 heights of women and men: 
 nan

this is the t-statistic for the first 30 heights of women and men: 
 nan

this is the p-value for the first 30 heights of women and men: 
 nan

解决方法

scipy.stats.ttest_ind 的前两个参数必须是要比较的数据集，而不是数据集的均值。改变这一行

t_statistic1,p_value1 = scipy.stats.ttest_ind(m_mean10,w_mean10,nan_policy='omit')

到

t_statistic1,p_value1 = scipy.stats.ttest_ind(m_height10,w_height10,nan_policy='omit')

（如果输入中没有 nan，您可以删除参数 nan_policy='omit'。）

有关计算 t 统计量的其他变体，请参阅 Perform 2 sample t-test