我有以下df,冗余的column3值" bbb"和" ddd"
col1 col2 col3
u1 1 aaa
u1 1 bbb
u1 1 bbb
u1 1 bbb
u1 1 ccc
u1 -1 ddd
u1 -1 ddd
我希望创建以下df,其中col3-redundant行由替换为 col2 = SUM 的单行替换:
col1 col2 col3
u1 1 aaa
u1 3 bbb
u1 1 ccc
u1 -2 ddd
提前致谢
答案 0 :(得分:2)
尝试
library(dplyr)
df %>%
group_by(col1, col3) %>%
summarise(col2=sum(col2))
# col1 col3 col2
#1 u1 aaa 1
#2 u1 bbb 3
#3 u1 ccc 1
#4 u1 ddd -2
或使用data.table
library(data.table)
setDT(df)[, list(col2=sum(col2)), by=list(col1, col3)]
或使用sqldf
library(sqldf)
sqldf('SELECT col1, col3,
sum(col2) as col2
from df
group by col1, col3')
# col1 col3 col2
#1 u1 aaa 1
#2 u1 bbb 3
#3 u1 ccc 1
#4 u1 ddd -2
或使用base R
aggregate(.~col1+col3, df, sum)
# col1 col3 col2
#1 u1 aaa 1
#2 u1 bbb 3
#3 u1 ccc 1
#4 u1 ddd -2