微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

使用 scipy 进行 T 测试返回 NaN 错误代码,即使不存在 NaN 值

如何解决使用 scipy 进行 T 测试返回 NaN 错误代码,即使不存在 NaN 值

我正在编写一个代码,该代码可以根据不同性别的身高计算 excel 电子表格中数据的 t 值和 p 值(1 和 0 用于区分 Excel 工作表中的性别)。我还需要使用不同的高度范围,所以是前 10 个高度,然后是前 20 个高度,最后是前 30 个高度。然而,它总是返回“nan”而不是数字,即使我写“nan_policy='omit'”。我知道这里有一个用户我有同样的问题,但是,他使用的是熊猫,这我不是。我正在使用 spyder4 和最新版本的 Anaconda。我也在使用 scipy 的 python 3.8.3 版和 1.5.0 版。代码如下:

import numpy as np
import scipy.stats

array = np.loadtxt(r'C:\filepath\Body-Data.csv',skiprows = 1,delimiter=',' )

slice10 = slice(0,10)
slice20 = slice(0,20)
slice30 = slice(0,30)

men_height = []
women_height = []

for i in range(8239):
    if array[i,0] == 0:
        women_height.append(array[i,2])
    elif array[i,0] == 1:
        men_height.append(array[i,2])
        
w_height10 = women_height[slice10]
w_height20 = women_height[slice20]
w_height30 = women_height[slice30]

m_height10 = men_height[slice10]
m_height20 = men_height[slice20]
m_height30 = men_height[slice30]

w_mean10 = np.mean(w_height10) 
w_mean20 = np.mean(w_height20) 
w_mean30 = np.mean(w_height30) 
    
m_mean10 = np.mean(m_height10)
m_mean20 = np.mean(m_height20)
m_mean30 = np.mean(m_height30)

t_statistic1,p_value1 = scipy.stats.ttest_ind(m_mean10,w_mean10,nan_policy='omit')
print("this is the t-statistic for the first 10 heights of women and men: \n",t_statistic1)
print("this is the p-value for the first 10 heights of women and men: \n",p_value1)


t_statistic2,p_value2 = scipy.stats.ttest_ind(m_mean20,w_mean20,nan_policy='omit')
print("this is the t-statistic for the first 20 heights of women and men: \n",t_statistic2)
print("this is the p-value for the first 20 heights of women and men: \n",p_value2)


t_statistic3,p_value3 = scipy.stats.ttest_ind(m_mean30,w_mean30,nan_policy='omit')
print("this is the t-statistic for the first 30 heights of women and men: \n",t_statistic3)
print("this is the p-value for the first 30 heights of women and men: \n",p_value3)

我的输出是:

this is the t-statistic for the first 10 heights of women and men: 
 nan

this is the p-value for the first 10 heights of women and men: 
 nan

this is the t-statistic for the first 20 heights of women and men: 
 nan

this is the p-value for the first 20 heights of women and men: 
 nan

this is the t-statistic for the first 30 heights of women and men: 
 nan

this is the p-value for the first 30 heights of women and men: 
 nan

解决方法

scipy.stats.ttest_ind 的前两个参数必须是要比较的数据集,而不是数据集的均值。改变这一行

t_statistic1,p_value1 = scipy.stats.ttest_ind(m_mean10,w_mean10,nan_policy='omit')

t_statistic1,p_value1 = scipy.stats.ttest_ind(m_height10,w_height10,nan_policy='omit')

(如果输入中没有 nan,您可以删除参数 nan_policy='omit'。)

有关计算 t 统计量的其他变体,请参阅 Perform 2 sample t-test

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。