根据R中的列汇总数据帧

时间:2014-04-01 10:24:00

标签: r

我试图在每个城市的总和中显示以下数据框:

> summary(dat1)
      Date                 City           Sales        
 Min.   :2010-06-18   Min.   : 1.00   Min.   :  667.4  
 1st Qu.:2011-02-18   1st Qu.:18.00   1st Qu.: 1138.6  
 Median :2011-10-28   Median :37.00   Median : 1507.5  
 Mean   :2011-10-29   Mean   :44.26   Mean   : 2065.4  
 3rd Qu.:2012-07-06   3rd Qu.:74.00   3rd Qu.: 2347.1  
 Max.   :2013-03-08   Max.   :99.00   Max.   :47206.6 

即。我想找到具有相应的日期X城市观测数据的数据框,该数据框将显示每天每个城市的销售总额。

2 个答案:

答案 0 :(得分:1)

这有几种可能性。仅举几例:

  1. 函数aggregate():

    i)aggregate(Sales~Date+City, data=df, sum)

    ii)aggregate(df$Sales, list(df$Date,df$City), sum)

  2. 函数tapply():

    i)tapply(df$Sales, list(df$Date, df$City), sum)

  3. 如果你有一个大型数据集,函数tapply()特别有用,因为聚合倾向于阻塞非常大的数据集,但tapply()通常更优雅地处理这些数据集。此外,tapply()aggregate()会以不同的格式生成输出,您可能希望选择最适合可能的进一步分析的输出。

    可以在下面给出的模拟数据上测试这些示例:

    df<-structure(list(Date = structure(c(4L, 2L, 4L, 2L, 3L, 4L, 3L, 
    2L, 2L, 2L, 2L, 4L, 1L, 4L, 2L, 4L, 2L, 3L, 4L, 2L, 3L, 3L, 4L, 
    3L, 4L, 2L, 2L, 2L, 3L, 1L, 1L, 4L, 2L, 4L, 1L, 2L, 1L, 2L, 3L, 
    2L, 2L, 3L, 2L, 1L, 1L, 3L, 2L, 1L, 1L, 3L, 3L, 1L, 3L, 1L, 1L, 
    1L, 3L, 2L, 3L, 1L, 3L, 3L, 2L, 2L, 4L, 2L, 1L, 3L, 3L, 1L, 4L, 
    1L, 2L, 2L, 1L, 2L, 2L, 2L), .Label = c("2014-01-01", "2014-02-01", 
    "2014-03-01", "2014-04-01"), class = "factor"), City = structure(c(1L, 
    2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 
    16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 1L, 2L, 
    3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 
    17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 1L, 2L, 3L, 
    4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 
    18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L), .Label = c("a", 
    "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", 
    "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"), class = "factor"), 
        Sales = c(100, 100, 93, 92, 95, 115, 104, 106, 113, 94, 93, 
        98, 116, 85, 98, 97, 103, 110, 105, 104, 107, 86, 92, 94, 
        106, 115, 112, 92, 103, 100, 101, 97, 95, 110, 103, 92, 91, 
        98, 100, 93, 108, 87, 96, 101, 87, 111, 90, 94, 110, 95, 
        110, 101, 88, 99, 106, 117, 101, 120, 92, 86, 118, 104, 99, 
        89, 103, 102, 121, 99, 106, 99, 107, 105, 109, 110, 112, 
        94, 100, 112)), .Names = c("Date", "City", "Sales"), row.names = c(NA, 
        -78L), class = "data.frame")
    

答案 1 :(得分:0)

请参阅aggregation功能

aggregate(Sales~Date+City, data=dat1, sum)