我正在尝试使用missForest包将丢失的数据归入相当大的数据集中。我的大多数变量都是分类的,有很多因素。当我运行missForest时,它会输入十进制值,有时甚至是负值。显然,我做错了什么。以下是我的流程:
首先:直接输入预测数据。我将十进制值插入到我的数据集中。我知道missForest只采用矩阵,但我不确定如何强制它识别哪些列是因素。另一个帖子上有人推荐虚拟编码,所以我接下来尝试了,结果相同。代码如下。
SECOND TRY:对每个预测器进行虚拟编码(非常耗时),然后运行该预测器。
homt_sub_dummy<-homt_sub[c("Psyprob.yes", "Psyprob.no","SUB2.2.0", "SUB2.2.1", "SUB2.2.2", "SUB2.2.3", "SUB2.2.4", "SUB2.2.5", "SUB2.2.6", "SUB2.2.7","Freq1.1", "Freq1.2", "Freq1.3", "Freq1.4","FRSTUSE1.0", "FRSTUSE1.1", "FRSTUSE1.2", "FRSTUSE1.3", "FRSTUSE1.4", "FRSTUSE1.5", "FRSTUSE1.6","FRSTUSE1.7", "FRSTUSE1.8", "FRSTUSE1.9", "FRSTUSE1.10", "FRSTUSE1.11","Freq2.1", "Freq2.2", "Freq2.3", "Freq2.4","AGEcont","Gender_male", "Gender_female", "Race2.0", "Race2.1", "Race2.2", "Arrests.0", "Arrests.1", "Arrests.2")]
homt_dummy_matrix<-data.matrix(homt_sub_dummy, rownames.force = NA)
homt_dummp.imp <- missForest(homt_dummy_matrix, verbose= TRUE, maxiter = 3, ntree = 20)
homt_dummy.imp.df<-as.data.frame(homt_dummp.imp$ximp)
View(homt_dummy.imp.df)
This is a chunk of the data.frame i saved with the imputed values
任何帮助将不胜感激。我很擅长估算。我想比较MICE的结果,但我似乎无法让missForest工作!!!
答案 0 :(得分:0)
您可以使用as.factor
函数来转换所需的数据类别。例如
cleveland_t <- transform(cleveland,V2=as.factor(V2),V3 = as.factor(V3),V6 = as.factor(V6),V7=as.factor(V7),V9 = as.factor(V9),V11=as.factor(V11),V12 = as.factor(V12),V13= as.factor(V13),v14=as.factor(V14))
然后使用sapply
检查课程