总结和分组列

时间:2015-07-25 15:03:12

标签: r

我有以下数据框。

Value Type Year
10     car 1995
23     car 1995
2      car 1997
10     car 2000
11     bus 1997
23     bus 1995
2      bus 1997
10     bus 2000
12     car 1997
13     bus 1995
14     jeep 2000
15     jeep 1995
23     jeep 1995
2      jeep 1997
10     jeep 2000
8      car  2000
9      bus  2000
1      jeep 1997

我想先根据列type然后year对行进行求和。我想要以下输出。

Value Type Year
   33    car  1995
   14    car  1997
   18    car  2000  
   36    bus  1995
   13    bus  1997
   19    bus  2000
   38    jeep 1995
    3    jeep 1997
   24    jeep 2000

谁能告诉我如何获得这个?

2 个答案:

答案 0 :(得分:5)

我们可以按组使用其中一个聚合功能。如果我们只对base R感兴趣,aggregate是一个有用的紧凑函数。

aggregate(Value ~ Year + Type, df1, FUN=sum)
# Year Type Value
#1 1995  car    33
#2 1997  car    14
#3 2000  car    18
#4 1995  bus    36
#5 1997  bus    13
#6 2000  bus    19
#7 1995 jeep    38
#8 1997 jeep     3
#9 2000 jeep    24

或者我们可以尝试dplyr

library(dplyr)
df1 %>%
   group_by(Type, Year) %>%
   summarise(Value=sum(Value))
#   Type Year Value
#1  car 1995    33
#2  car 1997    14
#3  car 2000    18
#4  bus 1995    36
#5  bus 1997    13
#6  bus 2000    19
#7 jeep 1995    38
#8 jeep 1997     3
#9 jeep 2000    24

或另一个紧凑而快速的选项是data.table

library(data.table)#v1.9.5+
setDT(df1)[, list(Value=sum(Value)), .(Type, Year)]
#Type Year Value
#1:  car 1995    33
#2:  car 1997    14
#3:  car 2000    18
#4:  bus 1997    13
#5:  bus 1995    36
#6:  bus 2000    19
#7: jeep 2000    24
#8: jeep 1995    38
#9: jeep 1997     3

或基于sqldf

的解决方案
library(sqldf)
sqldf('select Type, Year,
        sum(Value) as Value 
        from df1 
        group by Type, Year')

更新

如果我们想要plot

 library(ggplot2)
 df1 %>%
   group_by(Type, Year) %>%
   summarise(Value=sum(Value))  %>%
   ggplot(., aes(x=Year, y=Value))+
          geom_line() + 
          facet_wrap(~Type)

数据

  df1 <- structure(list(Value = c(10L, 23L, 2L, 10L, 11L, 23L, 2L, 10L, 
  12L, 13L, 14L, 15L, 23L, 2L, 10L, 8L, 9L, 1L), Type = c("car", 
  "car", "car", "car", "bus", "bus", "bus", "bus", "car", "bus", 
  "jeep", "jeep", "jeep", "jeep", "jeep", "car", "bus", "jeep"), 
  Year = c(1995L, 1995L, 1997L, 2000L, 1997L, 1995L, 1997L, 
  2000L, 1997L, 1995L, 2000L, 1995L, 1995L, 1997L, 2000L, 2000L, 
  2000L, 1997L)), .Names = c("Value", "Type", "Year"), 
  class =    "data.frame", row.names = c(NA, -18L))

   df1$Type <- factor(df1$Type, levels=unique(df1$Type))

答案 1 :(得分:0)

我知道这不是理想的输出(老实说,我不喜欢它,因为它与整洁数据的原则不一致),但这是另一个不同的解决方案。你可以考虑一下。

 tapply(df$Value, list(df$Type, df$Year), sum)
     1995 1997 2000
bus    36   13   19
car    33   14   18
jeep   38    3   24