微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

解决 R

如何解决解决 R

我试图在 R 中使用多重插补来解决数据集中的缺失值。有缺失值的变量衡量贫困,并根据个人收入是否低于联邦贫困水平的 100%、75%、200% 等分为几个类别。当贫困变量编码为 99 或 98 时,意味着它是 NA 并且需要进行估算。 NEWEDUC 变量也是分类变量,代表不同的教育水平,不需要为该变量估算值。

我曾尝试使用 MissMDA 和 FactomineR 软件包。我附上了下面的代码,希望有人能帮助解决我的错误。我在第 2.1 节中发现了 [这篇帖子][1],这与我正在尝试做的事情非常吻合。谢谢!

library(missMDA)
library(FactomineR) 

#here is a shortened version of the data frame (has over 3000 observations normally)
dfNA= structure(list(POVERTY = structure(c(1L,8L,2L,6L,17L,7L,4L,1L,15L,5L,13L,10L,3L,16L,12L,9L,19L,1L),.Label = c("11","12","13","14","21","22","23","24","25","31","32","33","34","35","36","37","38","98","99"),class = "factor"),NEWEDUC = structure(c(1L,3L),.Label = c("1","2","3","4","5"),class = "factor")),row.names = 30:70,class = "data.frame")


#check the number of dimensions
dim(dfNA)

#From this point on,I am honestly not entirely sure what the purpose of each of these steps are,was just following the website linked above  
res.mcaNA  <- MCA(dfNA,quali.sup =2)
nb <- estim_ncpMCA(dfNA,ncp.max=2)
res.impute <- imputeMCA(dfNA,ncp=2) #This is the stage where I begin to struggle,I don't understand why I need to select particular rows/columns or which ones to choose to impute the 99 value 
res.impute$tab.disj[1:10,10:21] 
apply(res.impute$tab.disj,1,sum)
res.impute$comp[,1] #the output here still includes a 99 here that I don't kNow how to instruct R to impute. Do I need to denote early on that the code 99 is NA with a line such as df$POVERTY[dfNA$POVERTY==99] <- NA?  

非常感谢任何帮助!谢谢。

[1]: http://juliejosse.com/wp-content/uploads/2018/06/DataAnalysisMissingR.html#2)_categoricalmixedmulti-block_data_with_missing_values

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。