我有以下数据框:
df1 <- data.frame( id = c(1,2,2,3),
word = c("house, garden, flower", "flower, red", "garden, tree, forest", "house, window, door, red"),
value = c(10,12,20,5),
stringsAsFactors = FALSE
)
现在我想基于id整合行。因此,如果存在重复的id,则应合并列字中的值,并且应将列值相加。这意味着df应如下所示:
id | word | value
1 | house, garden, flower | 10
2 | flower, red, garden, tree, forest | 32
3 | house, window, door, red | 5
有人有想法,如何解决这个问题?
答案 0 :(得分:2)
在基地R:
df1 <- data.frame( id = c(1,2,2,3),
word = c("house, garden, flower", "flower, red", "garden, tree, forest", "house, window, door, red"),
value = c(10,12,20,5),
stringsAsFactors = FALSE
)
want <- data.frame(id = unique(df1$id),
word = tapply(df1$word, df1$id, paste, collapse = ", "),
value = tapply(df1$value, df1$id, sum))
want
id word value
1 1 house, garden, flower 10
2 2 flower, red, garden, tree, forest 32
3 3 house, window, door, red 5
答案 1 :(得分:0)
使用tidyverse
非常简单。只需按id
分组,然后使用summarize
函数生成所需的组变量:
library(tidyverse)
df1 %>%
group_by(id) %>%
dplyr::summarize(word = paste0(word, collapse=", "),
value = sum(value))
id word value
<dbl> <chr> <dbl>
1 1 house, garden, flower 10
2 2 flower, red, garden, tree, forest 32
3 3 house, window, door, red 5
答案 2 :(得分:0)
只需使用dplyr
包:
library(dplyr)
df1 %>%
group_by(id) %>%
summarise(
word = paste(word, collapse = ', '),
value=sum(value)
)
输出:
# A tibble: 3 x 3
id word value
<dbl> <chr> <dbl>
1 1. house, garden, flower 10.
2 2. flower, red, garden, tree, forest 32.
3 3. house, window, door, red 5.