我有一个数据框(日期,股票,价格,大小),我想要汇总实例数,总和大小,得到最低和最高价格。但是,该部分有效,最终输出包含同一个自动收报机的多行。例如,DGAZ在cand数据框中出现两次,并且在results_all部分中出现两次。在results_all中,我想每个股票代码一行。
DGAZ in cand:
18 2017-01-18 DGAZ 3.74 836000
19 2017-01-18 DGAZ 3.76 500000
结果_all中的DGAZ:
date symbol print_count tot_shares min_price max_price
1 2017-01-18 DGAZ 2 1336000 3.74 3.76
2 2017-01-18 DGAZ 2 1336000 3.74 3.76
你知道为什么吗?
请注意,我最终会在多个日期执行此操作,这就是为什么有2个do.call语句。
可重复的代码:
cand <- structure(list(date = structure(c(17184, 17184, 17184, 17184,
17184, 17184, 17184, 17184, 17184, 17184, 17184, 17184, 17184,
17184, 17184, 17184, 17184, 17184, 17184), class = "Date"), stock = c("AAPL",
"ABB", "ABEV", "AMTD", "AXTA", "BNDX", "BWX", "BNDX", "BNPQY",
"BNPQY", "BPESF", "BTG", "BWX", "CLCD", "CMCSA", "CX", "DANDY",
"DGAZ", "DGAZ"), price = c(120, 22.41, 5.31, 46, 28.1, 54.06,
26.23, 54.08, 31.79, 31.79, 1.04, 2.86, 26.28, 46.3, 72.7, 8.28,
12.72, 3.74, 3.76), size = c(2350000L, 500000L, 500000L, 631400L,
553525L, 748655L, 1347888L, 711454L, 881744L, 881744L, 745808L,
700000L, 1296627L, 612347L, 840000L, 500000L, 650000L, 836000L,
500000L)), .Names = c("date", "stock", "price", "size"), row.names = c(NA,
19L), class = "data.frame")
result_all <- do.call(rbind, lapply(unique(cand$date), function(ind_date){
temp <- cand[cand$date == ind_date,]
result_day <- do.call(rbind, lapply(unique(temp$stock), function(ind_stock){
temp2 <- temp[temp$stock == ind_stock,]
print(data.frame(date=temp2$date, symbol=temp2$stock, print_count=nrow(temp2),
tot_shares = sum(temp2$size), min_price=min(temp2$price),
max_price=max(temp2$price)))
}))
}))
print(result_all)
谢谢。