如何组合column1上具有相同值的行并正确格式化column2的合并(参见示例)?
输入:
> df
COL1 COL2
1 b21 blabla1
2 b21 blabla2
3 b55 sdlafk
所需输出(2行,因为b21合并为1行):
COL1 COL2
1 b21 blabla1
blabla2
2 b55 sdlafk
来源
df <- structure(list(COL1 = structure(c(1L, 1L, 2L), .Label = c("b21",
"b55"), class = "factor"), COL2 = structure(1:3, .Label = c("blabla1",
"blabla2", "sdlafk"), class = "factor")), .Names = c("COL1",
"COL2"), class = "data.frame", row.names = c(NA, -3L))
答案 0 :(得分:0)
如果您希望Col2
存储在列表中:
data.table::setDT(df)[, .(COL2 = list(COL2)), .(COL1)]
COL1 COL2
1: b21 blabla1,blabla2
2: b55 sdlafk
如果您想将其转换为字符:
data.table::setDT(df)[, .(COL2 = paste(COL2, collapse = ",")), .(COL1)]
COL1 COL2
1: b21 blabla1,blabla2
2: b55 sdlafk
您也可以使用基础R
:
aggregate(COL2 ~ COL1, df, paste, collapse = ",")
COL1 COL2
1 b21 blabla1,blabla2
2 b55 sdlafk
答案 1 :(得分:0)
有几种选择,具体取决于您的目标(演示与存储):
df <- data.frame(COL1 = c("b21", "b21", "b55"),
COL2 = c("blabla1", "blabla2", "sdlafk"))
简单列表:
split(df$COL2, df$COL1)
# $b21
# [1] "blabla1" "blabla2"
# $b55
# [1] "sdlafk"
演示专用:
within(df, { COL1 = ifelse(duplicated(COL1), "", COL1) })
# COL1 COL2
# 1 b21 blabla1
# 2 blabla2
# 3 b55 sdlafk
dplyr
(补充@ Psidom的data.table
):
library(dplyr)
df %>%
group_by(COL1) %>%
summarize(COL2 = paste(COL2, collapse = ","))
# Source: local data frame [2 x 2]
# COL1 COL2
# <chr> <chr>
# 1 b21 blabla1,blabla2
# 2 b55 sdlafk