将行折叠成一个用于重复观察并将多个非重复变量连接成r数据帧中的单个变量

时间:2015-01-27 14:16:45

标签: r string dataframe

非常确定没有人问过这个问题。

> have
     x1     x2        x3
1 Apple Banana Potassium
2 Apple Banana  Thiamine

> want
     x1     x2                   x3
1 Apple Banana Potassium / Thiamine

x1和x2与ID变量类似,x3是分类值。此处提出了类似的问题:Condensing multiple observations on the same individual into a single row, adding multiples as new columns但是,如果分类值不可用,则会产生其他具有NA值的列。我试图将NA值转换为" "并将它们粘在一起。结果是不可取的。当有多个NA值被空格替换时看起来像这样。 "Potassium / / / "

1 个答案:

答案 0 :(得分:3)

这是我的尝试。我修改了你的数据;我添加了两行NA。我将x3转换为字符并用""替换NA。使用toStringsummarise,我合并了x3中的所有元素。最后,我按照您的问题中的说明将,更改为/

mydf <- structure(list(x1 = structure(c(1L, 1L, 1L, 1L), .Label = "Apple", class = "factor"), 
x2 = structure(c(1L, 1L, 1L, 1L), .Label = "Banana", class = "factor"), 
x3 = structure(c(1L, 2L, NA, NA), .Label = c("Potassium", 
"Thiamine"), class = "factor")), .Names = c("x1", "x2", "x3"
), class = "data.frame", row.names = c("1", "2", "3", "4"))

#     x1     x2        x3
#1 Apple Banana Potassium
#2 Apple Banana  Thiamine
#3 Apple Banana      <NA>
#4 Apple Banana      <NA>

library(dplyr)
mutate(group_by(mydf, x1, x2), 
       x3 = replace(as.character(x3), !complete.cases(x3), "")) %>%
summarise(x3 = paste(x3, collapse = " / "))

#     x1     x2                         x3
#1 Apple Banana Potassium / Thiamine /  /