微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

CDF 不是从零开始

如何解决CDF 不是从零开始

我有一个没有任何无限值的数据,如下所示:

data<-c(6.87,0.11,0.03,0.08,0.01,0.13,0.17,0.53,2.69,5.44,4.71,3.57,0.38,0.31,0.45,0.02,0.09,0.43,0.23,2.31,0.96,1.28,0.07,5.99,6.23,2.95,0.04,0.98,17.2,0.25,1.99,1.94,1.06,0.05,0.66,0.57,0.54,0.75,0.65,7.35,0.22,1.97,3.78,0.28,0.06,0.14,1.82,9.52,10.38,29.09,0.1,0.4,0.97,0.33,0.16,40.19,5.02,0.79,4.78,0.44,15.29,7.26,6.47,37.66,0.15,1.42,7.43,1.45,1.52,0.27,1.11,2.41,1.74,3.75,0.18,0.39,0.78,0.88,1.23,0.6,0.12,2.71,0.49,0.84,0.48,5.34,5.23,2.37,1.55,3.29,7.62,0.24,2.4,1.57,0.56,6.1,0.2,5.79,0.91,0.95,1.3,0.64,1.14,2.18,0.19,3.9,5.93,4.06,0.41,2.21,8.07,1.73,1.37,4.68,0.26,2.74,0.42,0.69,0.86,0.37,5.25,1.1,0.68,1.95,5.78,3.05,1.65,0.85,0.34,0.7,2.01,0.77,0.47,12.31,7.54,1.02,1.41,0)

当我尝试绘制 CDF 时,曲线不是从零开始的。它从 0.57 开始。我读到无限值和零值会导致这个问题。为了克服这个问题,我用非常小的值替换了零。

data[data==0]<-0.000001
plot(ecdf(data),xlim = c(min(data),max(data)))

enter image description here

但我仍然得到相同的结果。为什么会这样?

解决方法

我不确定为观察数据之外的值显示经验 cdf 是否有意义。所有低于最小值 data 的值的 ecdf 为零。您可以尝试以下操作:

f = ecdf(data)
curve(f,min(data) - 0.01,max(data))
,

就像问题 12 的评论中所说的那样,您的数据中有许多零,占总数据点的 57%。

mean(data == 0)
#[1] 0.5794045

如果您绘制 ECDF 和在该纵坐标处的水平线,您将看到 ECDF 从那里开始。

plot(ecdf(data))
abline(h = mean(data == 0))

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。