我有两个数据框,如下:
交易数据框:tradeData(样本):
Login OpenTime CloseTime Decision
859 13/01/2014 13/01/2014 1
859 16/01/2014 16/01/2014 1
859 21/01/2014 21/01/2014 1
859 21/01/2014 21/01/2014 1
859 22/01/2014 22/01/2014 1
859 23/01/2014 23/01/2014 1
859 27/01/2014 27/01/2014 1
859 03/02/2014 03/02/2014 1
859 04/02/2014 05/02/2014 1
859 07/02/2014 07/02/2014 1
859 11/02/2014 13/02/2014 1
939 06/02/2014 28/02/2014 1
939 06/02/2014 28/02/2014 1
939 06/02/2014 28/02/2014 1
1455 03/04/2014 03/04/2014 1
1455 04/04/2014 04/04/2014 1
1455 04/04/2014 07/04/2014 1
1455 08/04/2014 08/04/2014 1
1455 08/04/2014 08/04/2014 1
1455 09/04/2014 30/04/2014 1
1455 30/04/2014 30/04/2014 1
和另一个日期数据框:datesData(sample):
Login B_A A_B
859 22/01/2014 23/01/2014
859 03/02/2014 07/02/2014
859 11/02/2014 12/02/2014
939 06/02/2014 01/01/2200
1455 04/04/2014 08/04/2014
1455 09/05/2014 30/06/2014
在datesData数据框的任何行中的两个日期之间打开且与Login匹配的任何交易(可能是tradeData数据框中的一行)应在决策列中收到0。它必须在B_A列中的日期或之后打开,并在A_B列中的日期之前打开。此决策列预先填充了1,所以我需要做的就是插入0&#39>
生成的tradeData数据框如下所示:
Login OpenTime CloseTime Decision
859 13/01/2014 13/01/2014 1
859 16/01/2014 16/01/2014 1
859 21/01/2014 21/01/2014 1
859 21/01/2014 21/01/2014 1
859 22/01/2014 22/01/2014 0
859 23/01/2014 23/01/2014 1
859 27/01/2014 27/01/2014 1
859 03/02/2014 03/02/2014 0
859 04/02/2014 05/02/2014 0
859 07/02/2014 07/02/2014 1
859 11/02/2014 13/02/2014 0
939 06/02/2014 28/02/2014 0
939 06/02/2014 28/02/2014 0
939 06/02/2014 28/02/2014 0
1455 03/04/2014 03/04/2014 1
1455 04/04/2014 04/04/2014 0
1455 04/04/2014 07/04/2014 0
1455 08/04/2014 08/04/2014 1
1455 08/04/2014 08/04/2014 1
1455 09/04/2014 30/04/2014 0
1455 30/04/2014 30/04/2014 1
因此,例如,tradeData数据框中的第五行在2014年1月22日和2014年1月23日(在datesDate数据框中的第一行)之前打开并匹配该行中的登录,因此它收到0。
任何帮助都会很棒!如果有什么不清楚,请告诉我。
谢谢!
麦克
答案 0 :(得分:3)
一种方法是使用data.table
包:
library(data.table)
# convert to dates usefull columns
setDT(tradeData)
setkey(tradeData, Login)
tradeData[,OpenTime:=as.Date(OpenTime, format="%d/%m/%Y")]
# convert to dates usefull columns
df1 = datesData
df1$B_A = as.Date(df1$B_A, format="%d/%m/%Y")
df1$A_B = as.Date(df1$A_B, format="%d/%m/%Y")
tradeData[,Decision:=sapply(OpenTime,function(d){
dt=df1[df1$Login==Login,]
as.integer(!any(d>=dt$B_A & d<dt$A_B))
}),
by=Login]
结果如下所示:
> tradeData
Login OpenTime CloseTime Decision
1: 859 2014-01-13 13/01/2014 1
2: 859 2014-01-16 16/01/2014 1
3: 859 2014-01-21 21/01/2014 1
4: 859 2014-01-21 21/01/2014 1
5: 859 2014-01-22 22/01/2014 0
6: 859 2014-01-23 23/01/2014 1
7: 859 2014-01-27 27/01/2014 1
8: 859 2014-02-03 03/02/2014 0
9: 859 2014-02-04 05/02/2014 0
10: 859 2014-02-07 07/02/2014 1
11: 859 2014-02-11 13/02/2014 0
12: 939 2014-02-06 28/02/2014 0
13: 939 2014-02-06 28/02/2014 0
14: 939 2014-02-06 28/02/2014 0
15: 1455 2014-04-03 03/04/2014 1
16: 1455 2014-04-04 04/04/2014 0
17: 1455 2014-04-04 07/04/2014 0
18: 1455 2014-04-08 08/04/2014 1
19: 1455 2014-04-08 08/04/2014 1
20: 1455 2014-04-09 30/04/2014 1
21: 1455 2014-04-30 30/04/2014 1
答案 1 :(得分:2)
这是使用sqldf
包的等解决方案。
tradeData$OpenTime <- as.Date(trade.data$OpenTime, format="%d/%m/%Y")
datesData$B_A <- as.Date(datasData$B_A, format="%d/%m/%Y")
datesData$A_B <- as.Date(datasData$A_B, format="%d/%m/%Y")
sqldf(c("UPDATE tradeData
SET Decision = 0
WHERE EXISTS (SELECT * FROM datesData WHERE
tradeData.Login = datesData.Login AND
tradeData.OpenTime >= datesData.B_A AND
tradeData.OpenTime < datesData.A_B)",
"SELECT * FROM tradeData"))
# Login OpenTime CloseTime Decision
# 1 859 2014-01-13 13/01/2014 1
# 2 859 2014-01-16 16/01/2014 1
# 3 859 2014-01-21 21/01/2014 1
# 4 859 2014-01-21 21/01/2014 1
# 5 859 2014-01-22 22/01/2014 0
# 6 859 2014-01-23 23/01/2014 1
# 7 859 2014-01-27 27/01/2014 1
# 8 859 2014-02-03 03/02/2014 0
# 9 859 2014-02-04 05/02/2014 0
# 10 859 2014-02-07 07/02/2014 1
# 11 859 2014-02-11 13/02/2014 0
# 12 939 2014-02-06 28/02/2014 0
# 13 939 2014-02-06 28/02/2014 0
# 14 939 2014-02-06 28/02/2014 0
# 15 1455 2014-04-03 03/04/2014 1
# 16 1455 2014-04-04 04/04/2014 0
# 17 1455 2014-04-04 07/04/2014 0
# 18 1455 2014-04-08 08/04/2014 1
# 19 1455 2014-04-08 08/04/2014 1
# 20 1455 2014-04-09 30/04/2014 1
# 21 1455 2014-04-30 30/04/2014 1