使用do.call在R中创建多行的聚合

时间:2017-02-11 22:14:13

标签: r dataframe aggregate lapply do.call

我有一个数据框(日期,股票,价格,大小),我想要汇总实例数,总和大小,得到最低和最高价格。但是,该部分有效,最终输出包含同一个自动收报机的多行。例如,DGAZ在cand数据框中出现两次,并且在results_all部分中出现两次。在results_all中,我想每个股票代码一行。

DGAZ in cand:

18 2017-01-18  DGAZ   3.74  836000
19 2017-01-18  DGAZ   3.76  500000

结果_all中的DGAZ:

        date symbol print_count tot_shares min_price max_price
1 2017-01-18   DGAZ           2    1336000      3.74      3.76
2 2017-01-18   DGAZ           2    1336000      3.74      3.76

你知道为什么吗?

请注意,我最终会在多个日期执行此操作,这就是为什么有2个do.call语句。

可重复的代码:

cand <- structure(list(date = structure(c(17184, 17184, 17184, 17184, 
17184, 17184, 17184, 17184, 17184, 17184, 17184, 17184, 17184, 
17184, 17184, 17184, 17184, 17184, 17184), class = "Date"), stock = c("AAPL", 
"ABB", "ABEV", "AMTD", "AXTA", "BNDX", "BWX", "BNDX", "BNPQY", 
"BNPQY", "BPESF", "BTG", "BWX", "CLCD", "CMCSA", "CX", "DANDY", 
"DGAZ", "DGAZ"), price = c(120, 22.41, 5.31, 46, 28.1, 54.06, 
26.23, 54.08, 31.79, 31.79, 1.04, 2.86, 26.28, 46.3, 72.7, 8.28, 
12.72, 3.74, 3.76), size = c(2350000L, 500000L, 500000L, 631400L, 
553525L, 748655L, 1347888L, 711454L, 881744L, 881744L, 745808L, 
700000L, 1296627L, 612347L, 840000L, 500000L, 650000L, 836000L, 
500000L)), .Names = c("date", "stock", "price", "size"), row.names = c(NA, 
19L), class = "data.frame")

result_all <- do.call(rbind, lapply(unique(cand$date), function(ind_date){
  temp <- cand[cand$date == ind_date,]
  result_day <- do.call(rbind, lapply(unique(temp$stock), function(ind_stock){
    temp2 <- temp[temp$stock == ind_stock,]
    print(data.frame(date=temp2$date, symbol=temp2$stock, print_count=nrow(temp2), 
                     tot_shares = sum(temp2$size), min_price=min(temp2$price), 
                     max_price=max(temp2$price)))
  }))
}))

print(result_all)

谢谢。

0 个答案:

没有答案