Question

我正在使用ddply()下面的数据集

 trans_id account_id   type operation amount addl_info      date1
    58738     184612 CREDIT      BTBC  99095    295583 2016-12-12
    58741     243549 CREDIT      BTBC   5624    330985 2016-11-27
    58746     305880 CREDIT      BTBC  80054    133380 2016-12-14
    58747     369453 CREDIT      BTBC  24814    415032 2016-12-16
    58749     558181 CREDIT      BTBC  83588    182996 2016-11-19
    58759     234023 CREDIT      BTBC  38106    374469 2016-12-10

nov_dec_accounts <- ddply(nov_dec, .variables = c("account_id"), summarise,
                   notrans = sum(table(account_id)),
                   Totalamount = sum(amount), 
                   avgamount = Totalamount/notrans)

在下面创建此数据集的大约100万条记录运行上述代码需要5分钟以上。

head(nov_dec_accounts)
  account_id notrans Totalamount avgamount
1     125781       2       51132     25566
2     125799       1       55461     55461
3     125801       1       56194     56194
4     125804       1       48830     48830
5     125808       1       89952     89952
6     125812       1       39544     39544

是否有最佳选择，而不是使用ddply()和summarise作为我的例子。

替代ddply函数

0 个答案: