解决 R - 编程之家

如何解决解决 R

我试图在 R 中使用多重插补来解决数据集中的缺失值。有缺失值的变量衡量贫困，并根据个人收入是否低于联邦贫困水平的 100%、75%、200% 等分为几个类别。当贫困变量编码为 99 或 98 时，意味着它是 NA 并且需要进行估算。 NEWEDUC 变量也是分类变量，代表不同的教育水平，不需要为该变量估算值。

我曾尝试使用 MissMDA 和 FactomineR 软件包。我附上了下面的代码，希望有人能帮助解决我的错误。我在第 2.1 节中发现了 [这篇帖子][1]，这与我正在尝试做的事情非常吻合。谢谢！

library(missMDA)
library(FactomineR) 

#here is a shortened version of the data frame (has over 3000 observations normally)
dfNA= structure(list(POVERTY = structure(c(1L,8L,2L,6L,17L,7L,4L,1L,15L,5L,13L,10L,3L,16L,12L,9L,19L,1L),.Label = c("11","12","13","14","21","22","23","24","25","31","32","33","34","35","36","37","38","98","99"),class = "factor"),NEWEDUC = structure(c(1L,3L),.Label = c("1","2","3","4","5"),class = "factor")),row.names = 30:70,class = "data.frame")


#check the number of dimensions
dim(dfNA)

#From this point on,I am honestly not entirely sure what the purpose of each of these steps are,was just following the website linked above  
res.mcaNA  <- MCA(dfNA,quali.sup =2)
nb <- estim_ncpMCA(dfNA,ncp.max=2)
res.impute <- imputeMCA(dfNA,ncp=2) #This is the stage where I begin to struggle,I don't understand why I need to select particular rows/columns or which ones to choose to impute the 99 value 
res.impute$tab.disj[1:10,10:21] 
apply(res.impute$tab.disj,1,sum)
res.impute$comp[,1] #the output here still includes a 99 here that I don't kNow how to instruct R to impute. Do I need to denote early on that the code 99 is NA with a line such as df$POVERTY[dfNA$POVERTY==99] <- NA?

非常感谢任何帮助！谢谢。

[1]: http://juliejosse.com/wp-content/uploads/2018/06/DataAnalysisMissingR.html#2)_categoricalmixedmulti-block_data_with_missing_values