组合第1列中的2个相似行并合并r中的第二列文本

时间:2016-06-22 14:48:20

标签: r merge row

如何组合column1上具有相同值的行并正确格式化column2的合并(参见示例)?

输入:

  > df
      COL1    COL2
    1  b21 blabla1
    2  b21 blabla2
    3  b55  sdlafk

所需输出(2行,因为b21合并为1行):

      COL1    COL2
    1  b21 blabla1
           blabla2
    2  b55  sdlafk

来源

df <- structure(list(COL1 = structure(c(1L, 1L, 2L), .Label = c("b21", 
"b55"), class = "factor"), COL2 = structure(1:3, .Label = c("blabla1", 
"blabla2", "sdlafk"), class = "factor")), .Names = c("COL1", 
"COL2"), class = "data.frame", row.names = c(NA, -3L))

2 个答案:

答案 0 :(得分:0)

如果您希望Col2存储在列表中:

data.table::setDT(df)[, .(COL2 = list(COL2)), .(COL1)]
   COL1            COL2
1:  b21 blabla1,blabla2
2:  b55          sdlafk

如果您想将其转换为字符:

data.table::setDT(df)[, .(COL2 = paste(COL2, collapse = ",")), .(COL1)]
   COL1            COL2
1:  b21 blabla1,blabla2
2:  b55          sdlafk

您也可以使用基础R

aggregate(COL2 ~ COL1, df, paste, collapse = ",")
  COL1            COL2
1  b21 blabla1,blabla2
2  b55          sdlafk

答案 1 :(得分:0)

有几种选择,具体取决于您的目标(演示与存储):

 df <- data.frame(COL1 = c("b21", "b21", "b55"),
                  COL2 = c("blabla1", "blabla2", "sdlafk"))

简单列表:

split(df$COL2, df$COL1)
# $b21
# [1] "blabla1" "blabla2"
# $b55
# [1] "sdlafk"

演示专用:

within(df, { COL1 = ifelse(duplicated(COL1), "", COL1) })
#   COL1    COL2
# 1  b21 blabla1
# 2      blabla2
# 3  b55  sdlafk

dplyr(补充@ Psidom的data.table):

library(dplyr)
df %>%
  group_by(COL1) %>%
  summarize(COL2 = paste(COL2, collapse = ","))
# Source: local data frame [2 x 2]
#    COL1            COL2
#   <chr>           <chr>
# 1   b21 blabla1,blabla2
# 2   b55          sdlafk