我对R来说相当新,我试图使我的重新编码脚本更有效并且#34;正确"。我试过在论坛上搜索,但这让我无处可去 - 也许我使用了错误的术语并错过了它,所以如果问题已经提出,请耐心等待。
我有两个因子变量,我希望将其合并为一个因子变量。它们来自同一调查,都衡量教育水平。我之所以有两个变量的原因是因为一个不幸的调查结构,但这不是重点。要点是它们是互斥的(你只能在一个)。
我的数据如下所示:
education education2
9th grade <NA>
9th grade <NA>
<NA> 9th grade
<NA> 10th grade
10th grade <NA>
11th grade <NA>
<NA> 9th grade
<NA> 11th grade
<NA> <NA>
,我的脚本如下所示:
highest.edu <- vector(length=length(df$education))
a.grade <- which(df$education=="9th grade")
a.grade2 <- which(df$education2=="9th grade")
b.grade <- which(df$education=="10th grade")
b.grade2 <- which(df$education2=="10th grade")
c.grade <- which(df$education=="11th grade")
c.grade2 <- which(df$education=="11th grade")
highest.edu[a.grade] <- as.character(df$education)[a.grade]
highest.edu[a.grade2] <- as.character(df$education2)[a.grade2]
highest.edu[b.grade] <- as.character(df$education)[b.grade]
highest.edu[b.grade2] <- as.character(df$education2)[b.grade2]
highest.edu[c.grade] <- as.character(df$education)[c.grade]
highest.edu[c.grade2] <- as.character(df$education2)[c.grade2]
highest.edu <- factor(highest.edu)
highest.edu[highest.edu =="FALSE"] =NA
highest.edu <- factor(highest.edu)
当然这还不错,但当你有两个因子变量有15个等级或更多时,你就会开始寻找更快的选择。
我尝试过这样的事情,但没有运气:
a.grade <- which(df$education=="9th grade" | df$education2=="9th grade")
b.grade <- which(df$education=="10th grade" | df$education=="10th grade")
c.grade <- which(df$education=="11th grade" | df$education2=="11th grade")
highest.edu[a.grade] <- as.character(df$education)
[a.grade]|as.character(df$education2)[a.grade]
highest.edu[b.grade] <- as.character(df$education)
[b.grade]|as.character(df$education2)[b.grade]
给我这个: as.character错误(df $ education)[9年级] | as.character(df $ education2)[9年级]:只能对数字,逻辑或复杂类型进行操作
有没有办法克服这个问题?
感谢您提前提出任何建议
我瞄准的结果是:
highest.education
9th grade
9th grade
9th grade
10th grade
10th grade
11th grade
9th grade
11th grade
<NA>
帖子:&#39; R&#39;中两列的连接因子水平似乎是在寻找另一个结果
再次,谢谢
答案 0 :(得分:1)
您必须确保结果中包含所有因子水平:
levels(education) <- c(levels(education), levels(education2))
education[is.na(education)] <- education2[is.na(education)]
答案 1 :(得分:1)
一旦它们成为字符串就很容易
# make them character types
ed <- levels(df$education)[df$education]
ed2 <- levels(df$education2)[df$education2]
# make one new factor that integrates them
ed[is.na(ed)] <- ed2[is.na(ed)]
# make it a factor again
ed <- factor(ed)
您可以通过首先以字符形式阅读来加速该过程,尤其是如果您已在read.table
中设置了列类型。
答案 2 :(得分:0)
基本上,您需要确保这些级别既是唯一级别的“联合”或“交集”又是相同的顺序,那么您可以使用c
加入它们。搜索:[r]因子联合水平。