我有一个客户贷款数据库,我想根据LoanRefID做一个ddply汇总:
LoanRefId Tran_Type TransactionAmount
103 11 LoanIssue 1000.0000
104 11 InitiationFee 171.0000
105 11 Interest 59.6729
106 11 AdministrationFee 64.9332
107 11 RaisedClientInstallment 1295.5757
108 11 ClientInstallment 1295.4700
109 11 PaidUp 0.0000
110 11 Adjustment 0.1361
111 11 PaidUp 0.0000
112 12 LoanIssue 3000.0000
113 12 InitiationFee 399.0000
114 12 Interest 94.9858
115 12 AdministrationFee 38.6975
116 12 RaisedClientInstallment 3532.6350
117 12 ClientInstallment 3532.6100
118 12 PaidUp 0.0000
119 12 Adjustment 0.0733
120 12 PaidUp 0.0000
但是,我只希望每个loanID只对某些行进行求和。具体来说,我只想总结Tran_Type ==“ClientInstallment”的位置。
我能想到的唯一方法(似乎不起作用)是:
> ddply(test, c("LoanRefId"), summarise, cash_in = sum(test[test$Tran_Type == "ClientInstallment","TransactionAmount"]))
LoanRefId cash_in
1 11 4828.08
2 12 4828.08
这不是每个LoanRefId的总和,它只是将Tran_Type ==“CLientInstallment”的所有金额相加,这是错误的。
有没有更好的方法来做这个逻辑和?
答案 0 :(得分:3)
有人可能会添加plyr
个答案,但现在base R
,dplyr
或data.table
的使用范围更广。 plyr已更新和升级。值得花时间学习更新的实现,因为它们更高效且功能丰富。
基础R
aggregate(TransactionAmount ~ LoanRefId, df[df$Tran_Type == "ClientInstallment",], sum)
# LoanRefId TransactionAmount
#1 11 1295.47
#2 12 3532.61
<强> dplyr 强>
library(dplyr)
df %>%
group_by(LoanRefId) %>%
filter(Tran_Type == "ClientInstallment") %>%
summarise(TransactionAmount = sum(TransactionAmount))
#Source: local data frame [2 x 2]
#
# LoanRefId TransactionAmount
# (int) (dbl)
#1 11 1295.47
#2 12 3532.61
<强> data.table 强>
setDT(df)[Tran_Type == "ClientInstallment", sum(TransactionAmount), by=LoanRefId]
# LoanRefId V1
#1: 11 1295.47
#2: 12 3532.61
注意干净的data.table
语法是多少:)。很棒的学习工具。
答案 1 :(得分:2)
另一个base R
选项是tapply
with(subset(df1, Tran_Type=='ClientInstallment'),
tapply(TransactionAmount, LoanRefId, FUN=sum))
# 11 12
#1295.47 3532.61
或者,如果我们需要plyr
(回到过去)
library(plyr)
ddply(df1, .(LoanRefId), summarise,
TransactionAmount = sum(TransactionAmount[Tran_Type=='ClientInstallment']))
# LoanRefId TransactionAmount
#1 11 1295.47
#2 12 3532.61
答案 2 :(得分:2)
这是另一种可能性,只是为了完整性:
(defun own-reverse (list &optional (acc ()))
(if (endp list)
acc
(own-reverse (rest list)
(cons (first list) acc))))
答案 3 :(得分:0)
老实说,data.table
可以挽救生命。
test[Tran_Type == "ClientInstallment",
sum(TransactionAmount), by=LoanRefId]