如果您看到我的个人资料,我的所有问题都是关于数据框的,这是另一个问题!
我有一个特定的数据框,这是借记和贷记交易之间合并的结果
>head(allTxns)
Cust_no CreditDate Credit DebitDate Debit
1 12345 2014-10-01 200 2014-10-03 400
2 12345 2014-10-01 200 2014-10-04 150
3 12345 2014-10-01 200 2014-10-15 800
4 33344 2014-10-03 500 2014-10-04 50
5 33344 2014-10-03 500 2014-10-05 504
6 33344 2014-10-03 500 2014-10-06 332
7 33344 2014-10-03 500 2014-10-08 56
8 66554 2014-10-10 660 2014-10-04 150
9 66554 2014-10-10 660 2014-10-05 800
10 66554 2014-10-10 660 2014-10-11 400
11 66554 2014-10-10 660 2014-10-12 150
12 66554 2014-10-10 660 2014-10-13 800
我的目标是获取那些行,DebitDate位于CreditDate的5天之间,因此我尝试对数据进行子集化,其中我使用:
运算符来设置日期范围
FiveDays <- allTxns$CreditDate+5 #Results in a vector which has date + 5 days
allTxns <- cbind(allTxns[1:2],FiveDays,allTxns[4:6]) #Adding the vector as a column of dataframe
newDf <- allTxns[allTxns$DebitDate %in% allTxns$CreditDate:allTxns$FiveDays]
在上面的代码中,我得到了以下逻辑错误,其中只使用了第一个元素
Warning messages:
1: In mer32$DepositDate:mer32$FiveDays2 :
numerical expression has 3994 elements: only the first used
2: In mer32$DepositDate:mer32$FiveDays2 :
numerical expression has 3994 elements: only the first used
因此,我所需的输出仅限于第一个Cust_no(12345),而不是应用于其他行。 如何确保范围条件适用于所有行?
输出错误
>head(newDf)
row.names Cust_no CreditDate Credit DebitDate Debit
1 12345 2014-10-01 200 2014-10-03 400
2 12345 2014-10-01 200 2014-10-04 150
4 33344 2014-10-03 500 2014-10-04 50
5 33344 2014-10-03 500 2014-10-05 504
6 33344 2014-10-03 500 2014-10-06 332
7 33344 2014-10-03 500 2014-10-08 56
8 66554 2014-10-10 660 2014-10-04 150
9 66554 2014-10-10 660 2014-10-05 800
10 66554 2014-10-10 660 2014-10-11 400
11 66554 2014-10-10 660 2014-10-12 150
12 66554 2014-10-10 660 2014-10-13 800
正确输出
>head(newDf)
row.names Cust_no CreditDate Credit DebitDate Debit
1 12345 2014-10-01 200 2014-10-03 400
2 12345 2014-10-01 200 2014-10-04 150
4 33344 2014-10-03 500 2014-10-04 50
5 33344 2014-10-03 500 2014-10-05 504
6 33344 2014-10-03 500 2014-10-06 332
7 33344 2014-10-03 500 2014-10-08 56
10 66554 2014-10-10 660 2014-10-11 400
11 66554 2014-10-10 660 2014-10-12 150
12 66554 2014-10-10 660 2014-10-13 800
答案 0 :(得分:1)
尝试
allTxns[with(allTxns , CreditDate < DebitDate & DebitDate <=FiveDays),]
# Cust_no CreditDate FiveDays Credit DebitDate Debit
#1 12345 2014-10-01 2014-10-06 200 2014-10-03 400
#2 12345 2014-10-01 2014-10-06 200 2014-10-04 150
#4 33344 2014-10-03 2014-10-08 500 2014-10-04 50
#5 33344 2014-10-03 2014-10-08 500 2014-10-05 504
#6 33344 2014-10-03 2014-10-08 500 2014-10-06 332
#7 33344 2014-10-03 2014-10-08 500 2014-10-08 56
#10 66554 2014-10-10 2014-10-15 660 2014-10-11 400
#11 66554 2014-10-10 2014-10-15 660 2014-10-12 150
#12 66554 2014-10-10 2014-10-15 660 2014-10-13 800
答案 1 :(得分:1)
这个旧问题已经有了一个公认的答案。但是,我注意到问题和答案可以简化,因为创建额外的FiveDays
列
allTxns[with(allTxns, CreditDate <= DebitDate & DebitDate <= CreditDate + 5L), ]
Cust_no CreditDate Credit DebitDate Debit 1 12345 2014-10-01 200 2014-10-03 400 2 12345 2014-10-01 200 2014-10-04 150 4 33344 2014-10-03 500 2014-10-04 50 5 33344 2014-10-03 500 2014-10-05 504 6 33344 2014-10-03 500 2014-10-06 332 7 33344 2014-10-03 500 2014-10-08 56 10 66554 2014-10-10 660 2014-10-11 400 11 66554 2014-10-10 660 2014-10-12 150 12 66554 2014-10-10 660 2014-10-13 800