我有一个名为cust
的数据框 > head(cust,15)
Txn_date Cust_no Credit
1 2013-12-02 12345000 400.00
2 2013-12-02 12345000 300.00
3 2013-12-02 12345000 304.71
4 2013-12-02 12345000 475.00
5 2013-12-02 12345000 325.00
6 2013-12-02 34567890 1390.00
7 2013-12-02 34567890 100.00
8 2013-12-02 34567890 500.00
9 2013-12-02 23232323 5.00
10 2013-12-02 23232323 130.00
11 2013-12-02 23232323 5975.00
12 2013-12-02 23232323 3711.00
13 2013-12-02 14345422 12530.50
14 2013-12-02 14345422 3312.00
15 2013-12-02 98765432 370.00
要根据Cust_no计算总金额,我使用了within
,然后将cumsum
应用于它,如下所示
newCust <-within(cust, {
RunningTotal <- ave(cust$Credit, cust$Cust_no, FUN = cumsum)})
因此我的结果就是这样,因为您可以看到Credit根据客户编号
递增>head(newCust,15)
Txn_date Cust_no Credit RunningTotal
1 2013-12-02 12345000 400.00 400.00
2 2013-12-02 12345000 300.00 700.00
3 2013-12-02 12345000 304.71 1004.71
4 2013-12-02 12345000 475.00 1479.71
5 2013-12-02 12345000 325.00 1804.71
6 2013-12-02 34567890 1390.00 1390.00
7 2013-12-02 34567890 100.00 1490.00
8 2013-12-02 34567890 500.00 1990.00
9 2013-12-02 23232323 5.00 5.00
10 2013-12-02 23232323 130.00 135.00
11 2013-12-02 23232323 5975.00 6110.00
12 2013-12-02 23232323 3711.00 9821.00
13 2013-12-02 14345422 12530.50 12530.50
14 2013-12-02 14345422 3312.00 15842.5
15 2013-12-02 98765432 370.00 370
现在我的问题是,如何使用within
和length
的上述逻辑找出每个Cust_no的事务数量?或许,其他一些逻辑。
即使我尝试使用aggregate()
和apply
,我也没有得到所需的输出,如下所示..
Txn_date Cust_no Credit Frequency
1 2013-12-02 12345000 400.00 1
2 2013-12-02 12345000 300.00 2
3 2013-12-02 12345000 304.71 3
4 2013-12-02 12345000 475.00 4
5 2013-12-02 12345000 325.00 5
6 2013-12-02 34567890 1390.00 1
7 2013-12-02 34567890 100.00 2
8 2013-12-02 34567890 500.00 3
9 2013-12-02 23232323 5.00 1
10 2013-12-02 23232323 130.00 2
11 2013-12-02 23232323 5975.00 3
12 2013-12-02 23232323 3711.00 4
13 2013-12-02 14345422 12530.50 1
14 2013-12-02 14345422 3312.00 2
15 2013-12-02 98765432 370.00 1
答案 0 :(得分:2)
你可以尝试
within(cust, Frequency <- ave(seq_along(Cust_no), Cust_no, FUN=seq_along))
# Txn_date Cust_no Credit Frequency
#1 2013-12-02 12345000 400.00 1
#2 2013-12-02 12345000 300.00 2
#3 2013-12-02 12345000 304.71 3
#4 2013-12-02 12345000 475.00 4
#5 2013-12-02 12345000 325.00 5
#6 2013-12-02 34567890 1390.00 1
#7 2013-12-02 34567890 100.00 2
#8 2013-12-02 34567890 500.00 3
#9 2013-12-02 23232323 5.00 1
#10 2013-12-02 23232323 130.00 2
#11 2013-12-02 23232323 5975.00 3
#12 2013-12-02 23232323 3711.00 4
#13 2013-12-02 14345422 12530.50 1
#14 2013-12-02 14345422 3312.00 2
#15 2013-12-02 98765432 370.00 1
或使用splitstackshape
library(splitstackshape)
getanID(cust, 'Cust_no')
或使用data.table
library(data.table)
setDT(cust)[, Frequency:=1:.N, by=Cust_no]