好的,我有这个data.frame:
A B C
1 yellow purple <NA>
2 <NA> <NA> yellow
3 orange yellow <NA>
4 orange <NA> brown
5 <NA> brown purple
6 yellow purple pink
7 purple green pink
8 yellow pink green
9 purple orange <NA>
10 purple <NA> brown
我感兴趣的是从第一列中获取所有缺失的值,并将其替换为其他列中的值,例如行2,4,5和10。
A B C
1 yellow purple <NA>
2 yellow <NA> <NA>
3 orange yellow <NA>
4 orange brown <NA>
5 brown purple <NA>
6 yellow purple pink
7 purple green pink
8 yellow pink green
9 purple orange <NA>
10 purple brown <NA>
我的想法是遍历列以获取具有缺失值的行,并将其替换为右侧列中的值,但这也可能存在缺陷,因为如果列2中有4列和2个值NA是3。有没有人知道可能有效的算法?
答案 0 :(得分:2)
我们可以遍历行并连接非NA元素,后跟NA元素并将其分配回数据集
df[] <- t(apply(df, 1, function(x) c(x[!is.na(x)], x[is.na(x)])))
df
# A B C
#1 yellow purple <NA>
#2 yellow <NA> <NA>
#3 orange yellow <NA>
#4 orange brown <NA>
#5 brown purple <NA>
#6 yellow purple pink
#7 purple green pink
#8 yellow pink green
#9 purple orange <NA>
#10 purple brown <NA>
df <- structure(list(A = c("yellow", NA, "orange", "orange", NA, "yellow",
"purple", "yellow", "purple", "purple"), B = c("purple", NA,
"yellow", NA, "brown", "purple", "green", "pink", "orange", NA
), C = c(NA, "yellow", NA, "brown", "purple", "pink", "pink",
"green", NA, "brown")), .Names = c("A", "B", "C"), row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10"), class = "data.frame")