我有两个数据集表A和表B.两个表键都是Item。
我正在研究一个R脚本,如果它落在表B的日期范围内,它会将表A中的标志列设置为“x”。
我尝试使用dplyr中的between函数在原始数据集上获取“期望单个值”的错误消息。
Table A
Item Date Flag
Test1 1/1/2018
Test1 1/2/2018 x
Test1 1/3/2018 x
Test1 1/4/2018 x
Test1 1/5/2018
Test2 1/6/2018
Test2 1/7/2018 x
Test2 1/8/2018
Table B
Item Sdate Edate
Test 1 1/2/2018 1/4/2018
Test 2 1/7/2018 1/7/2018
答案 0 :(得分:1)
您可以使用dplyr
...
library(dplyr)
TableA %>% left_join(TableB) %>% #merge in the TableB information
mutate(Flag=c("","x")[1+(as.Date(Date) >= as.Date(Sdate) &
as.Date(Date) <= as.Date(Edate))]) %>%
select(Item,Date,Flag) #remove the TableB columns
Item Date Flag
1 Test1 1/1/2018
2 Test1 1/2/2018 x
3 Test1 1/3/2018 x
4 Test1 1/4/2018 x
5 Test1 1/5/2018
6 Test2 1/6/2018
7 Test2 1/7/2018 x
8 Test2 1/8/2018
答案 1 :(得分:0)
您可以使用data.table
包中提供的非equi连接执行此操作:
library(data.table)
table_a <- as.data.table(table_a)
table_b <- as.data.table(table_b)
# Need to convert dates to Date class if not type Date already:
table_a[, Date := as.Date(Date)]
table_b[, Sdate := as.Date(Sdate)]
table_b[, Edate := as.Date(Edate)]
# Make sure the values in the Item column can be joined ("Test 1" should be "Test1")
table_b[, Item := gsub(" ", "", Item)]
# Create a new empty flag column
table_a[, Flag := ""]
# Non-equi join, match rows where the value in the Item column is the same and the
# value in the Date column is between the Sdate and Edate,
# then update the flag column for those rows in table_a
table_a[table_b, on = .(Item, Date >= Sdate, Date <= Edate), Flag := "x"]