R:计算每笔交易的交易频率和增量

时间:2014-12-02 04:49:41

标签: r dataframe grouping frequency counting

我有一个名为cust

的数据框
    > head(cust,15)
    Txn_date  Cust_no   Credit
1  2013-12-02 12345000  400.00
2  2013-12-02 12345000  300.00
3  2013-12-02 12345000  304.71
4  2013-12-02 12345000  475.00
5  2013-12-02 12345000  325.00
6  2013-12-02 34567890  1390.00
7  2013-12-02 34567890  100.00
8  2013-12-02 34567890  500.00
9  2013-12-02 23232323  5.00
10 2013-12-02 23232323  130.00
11 2013-12-02 23232323  5975.00
12 2013-12-02 23232323  3711.00
13 2013-12-02 14345422  12530.50
14 2013-12-02 14345422  3312.00
15 2013-12-02 98765432  370.00

要根据Cust_no计算总金额,我使用了within,然后将cumsum应用于它,如下所示

newCust <-within(cust, {
   RunningTotal <- ave(cust$Credit, cust$Cust_no, FUN = cumsum)})

因此我的结果就是这样,因为您可以看到Credit根据客户编号

递增
>head(newCust,15)
   Txn_date  Cust_no   Credit   RunningTotal    
1  2013-12-02 12345000  400.00   400.00
2  2013-12-02 12345000  300.00   700.00
3  2013-12-02 12345000  304.71   1004.71
4  2013-12-02 12345000  475.00   1479.71
5  2013-12-02 12345000  325.00   1804.71
6  2013-12-02 34567890  1390.00  1390.00
7  2013-12-02 34567890  100.00   1490.00
8  2013-12-02 34567890  500.00   1990.00
9  2013-12-02 23232323  5.00     5.00
10 2013-12-02 23232323  130.00   135.00
11 2013-12-02 23232323  5975.00  6110.00
12 2013-12-02 23232323  3711.00  9821.00
13 2013-12-02 14345422  12530.50 12530.50
14 2013-12-02 14345422  3312.00  15842.5
15 2013-12-02 98765432  370.00   370

现在我的问题是,如何使用withinlength的上述逻辑找出每个Cust_no的事务数量?或许,其他一些逻辑。

即使我尝试使用aggregate()apply,我也没有得到所需的输出,如下所示..

   Txn_date  Cust_no   Credit   Frequency   
1  2013-12-02 12345000  400.00   1
2  2013-12-02 12345000  300.00   2
3  2013-12-02 12345000  304.71   3
4  2013-12-02 12345000  475.00   4
5  2013-12-02 12345000  325.00   5
6  2013-12-02 34567890  1390.00  1
7  2013-12-02 34567890  100.00   2
8  2013-12-02 34567890  500.00   3
9  2013-12-02 23232323  5.00     1
10 2013-12-02 23232323  130.00   2
11 2013-12-02 23232323  5975.00  3
12 2013-12-02 23232323  3711.00  4
13 2013-12-02 14345422  12530.50 1
14 2013-12-02 14345422  3312.00  2
15 2013-12-02 98765432  370.00   1

1 个答案:

答案 0 :(得分:2)

你可以尝试

 within(cust, Frequency <- ave(seq_along(Cust_no), Cust_no, FUN=seq_along))
 #     Txn_date  Cust_no   Credit Frequency
 #1  2013-12-02 12345000   400.00         1
 #2  2013-12-02 12345000   300.00         2
 #3  2013-12-02 12345000   304.71         3
 #4  2013-12-02 12345000   475.00         4
 #5  2013-12-02 12345000   325.00         5
 #6  2013-12-02 34567890  1390.00         1
 #7  2013-12-02 34567890   100.00         2
 #8  2013-12-02 34567890   500.00         3
 #9  2013-12-02 23232323     5.00         1
 #10 2013-12-02 23232323   130.00         2
 #11 2013-12-02 23232323  5975.00         3
 #12 2013-12-02 23232323  3711.00         4
 #13 2013-12-02 14345422 12530.50         1
 #14 2013-12-02 14345422  3312.00         2
 #15 2013-12-02 98765432   370.00         1

或使用splitstackshape

 library(splitstackshape)
 getanID(cust, 'Cust_no')

或使用data.table

 library(data.table)
 setDT(cust)[, Frequency:=1:.N, by=Cust_no]