R ddply仅汇总选定/特定/逻辑行的总和

时间:2015-11-13 08:05:40

标签: r dataframe plyr

我有一个客户贷款数据库,我想根据LoanRefID做一个ddply汇总:

    LoanRefId               Tran_Type TransactionAmount
103        11               LoanIssue         1000.0000
104        11           InitiationFee          171.0000
105        11                Interest           59.6729
106        11       AdministrationFee           64.9332
107        11 RaisedClientInstallment         1295.5757
108        11       ClientInstallment         1295.4700
109        11                  PaidUp            0.0000
110        11              Adjustment            0.1361
111        11                  PaidUp            0.0000
112        12               LoanIssue         3000.0000
113        12           InitiationFee          399.0000
114        12                Interest           94.9858
115        12       AdministrationFee           38.6975
116        12 RaisedClientInstallment         3532.6350
117        12       ClientInstallment         3532.6100
118        12                  PaidUp            0.0000
119        12              Adjustment            0.0733
120        12                  PaidUp            0.0000

但是,我只希望每个loanID只对某些行进行求和。具体来说,我只想总结Tran_Type ==“ClientInstallment”的位置。

我能想到的唯一方法(似乎不起作用)是:

> ddply(test, c("LoanRefId"), summarise, cash_in = sum(test[test$Tran_Type == "ClientInstallment","TransactionAmount"]))

  LoanRefId cash_in
1        11 4828.08
2        12 4828.08

这不是每个LoanRefId的总和,它只是将Tran_Type ==“CLientInstallment”的所有金额相加,这是错误的。

有没有更好的方法来做这个逻辑和?

4 个答案:

答案 0 :(得分:3)

有人可能会添加plyr个答案,但现在base Rdplyrdata.table的使用范围更广。 plyr已更新和升级。值得花时间学习更新的实现,因为它们更高效且功能丰富。

基础R

aggregate(TransactionAmount ~ LoanRefId, df[df$Tran_Type == "ClientInstallment",], sum)
#  LoanRefId TransactionAmount
#1        11           1295.47
#2        12           3532.61

<强> dplyr

library(dplyr)
df %>% 
  group_by(LoanRefId) %>% 
  filter(Tran_Type == "ClientInstallment") %>%
  summarise(TransactionAmount = sum(TransactionAmount))
#Source: local data frame [2 x 2]
#
#  LoanRefId TransactionAmount
#      (int)             (dbl)
#1        11           1295.47
#2        12           3532.61

<强> data.table

setDT(df)[Tran_Type == "ClientInstallment", sum(TransactionAmount), by=LoanRefId]
#   LoanRefId      V1
#1:        11 1295.47
#2:        12 3532.61

注意干净的data.table语法是多少:)。很棒的学习工具。

答案 1 :(得分:2)

另一个base R选项是tapply

 with(subset(df1, Tran_Type=='ClientInstallment'),
      tapply(TransactionAmount, LoanRefId, FUN=sum))
 #    11      12 
 #1295.47 3532.61 

或者,如果我们需要plyr(回到过去)

library(plyr)
ddply(df1, .(LoanRefId), summarise, 
      TransactionAmount = sum(TransactionAmount[Tran_Type=='ClientInstallment']))
#  LoanRefId TransactionAmount
#1        11           1295.47
#2        12           3532.61

答案 2 :(得分:2)

这是另一种可能性,只是为了完整性:

(defun own-reverse (list &optional (acc ()))
  (if (endp list)
      acc
      (own-reverse (rest list)
                   (cons (first list) acc))))

答案 3 :(得分:0)

老实说,data.table可以挽救生命。

test[Tran_Type == "ClientInstallment", 
     sum(TransactionAmount), by=LoanRefId]