为重复的rowID聚合数字列值,并保留非数字列的第一个出现值

时间:2017-08-16 14:43:05

标签: r

ThisWorkbook

我希望输出为:

ID  Method  Sales
1   Call    10
2   Visit   20
3   Call    10
2   Visit   5
5   Call    5
1   Call    10
2   Visit   15

我可以根据ID汇总销售,但不知道如何引入方法。

3 个答案:

答案 0 :(得分:1)

的一般解决方案(根据您的标题)
  1. 汇总所有数字变量和
  2. 保留任何非数字变量的第一个值:
  3. library(dplyr)
    df %>% group_by(ID) %>% mutate_if(is.numeric, sum) %>% slice(1)
    

    给出:

    # A tibble: 4 x 3
    # Groups:   ID [4]
         ID Method Sales
      <int>  <chr> <int>
    1     1   Call    20
    2     2  Visit    40
    3     3   Call    10
    4     5   Call     5
    

答案 1 :(得分:0)

使用包dplyr group_bysummarise

DF%>%group_by(ID)%>%dplyr::summarise(Method=first(Method),Sales=sum(Sales))

# A tibble: 4 x 3
     ID Method Sales
  <int>  <chr> <int>
1     1   Call    20
2     2  Visit    40
3     3   Call    10
4     5   Call     5

根据您的额外要求编辑:使用@ lmo&#39; s dput

dat1=dat[ ,sapply(dat, is.numeric)]
dat2=data.frame(dat[ ,sapply(dat, is.numeric)==FALSE],dat$ID)
dat1=dat1%>%group_by(ID)%>%dplyr::summarise_all(sum)
dat2=dat2%>%group_by(dat.ID)%>%dplyr::summarise_all(first)
result=cbind(dat1,dat2)
result$dat.ID=NULL

答案 2 :(得分:0)

基础R中的解决方案是分别计算所需的值并将结果合并在一起:

merge(aggregate(Method~ID, dat, head, 1), aggregate(Sales~ID, dat, sum), by="ID")
  ID Method Sales
1  1   Call    20
2  2  Visit    40
3  3   Call    10
4  5   Call     5

对于data.table,解决方案是

library(data.table)
setDT(dat)[, .(Method=first(Method), Sales=sum(Sales)), by=ID]
   ID Method Sales
1:  1   Call    20
2:  2  Visit    40
3:  3   Call    10
4:  5   Call     5

数据

dat <- 
structure(list(ID = c(1L, 2L, 3L, 2L, 5L, 1L, 2L), Method = structure(c(1L, 
2L, 1L, 2L, 1L, 1L, 2L), .Label = c("Call", "Visit"), class = "factor"), 
    Sales = c(10L, 20L, 10L, 5L, 5L, 10L, 15L)), .Names = c("ID", 
"Method", "Sales"), class = "data.frame", row.names = c(NA, -7L
))