我正在尝试按2个data.tables中的组子集/匹配数据,但无法弄清楚这是如何在R. 我有以下data.table,它有一个City_ID和一个时间戳(列名=时间)。
Library(data.table)
timetable <- data.table(City_ID=c("12","9"),
Time=c("12-29-2013-22:05:03","12-29-2013-11:59:00"))
我有第二个data.table,有几个城市和时间戳的观察(加上额外的数据)。该表如下所示:
DT = data.table(City_ID =c("12","12","12","9","9","9"),
Time= c("12-29-2013-13:05:13","12-29-2013-22:05:03",
"12-28-2013-13:05:13","12-29-2013-11:59:00",
"01-30-2013-10:05:03","12-28-2013-13:05:13"),
Other=1:6)
现在我需要找到DT中每个城市的观察结果,其中有一个时间&gt; =时间在其他data.table“时间表”(基本上是匹配表)。只应保留那些记录(包括未用于计算的列;在示例列中“其他”)。我想要的结果如下:
desiredresult = data.table(City_ID=c("12","9"),
Time= c("12-29-2013-22:05:03","12-29-2013-11:59:00"),
Other=c("2","4"))
我尝试了以下内容:
setkey(DT, City_ID, Time)
setkey(timetable, City_ID)
failedresult = DT[,Time >= timetable[Time], by=City_ID]
failedresult2 = DT[,Time >= timetable, by=City_ID]
BTW:我知道额外分割日期和时间可能会更好,但这可能会使示例更加复杂(当我测试通过data.table在时间戳中找到最小值时,它似乎有效)。
答案 0 :(得分:3)
以下是执行此任务的方法:
# 1) transform string to POSIXct object
DT[ , Time := as.POSIXct(strptime(Time, "%m-%d-%Y-%X"))]
timetable[ , Time := as.POSIXct(strptime(Time, "%m-%d-%Y-%X"))]
# 2) set key
setkey(DT, City_ID)
setkey(timetable, City_ID)
# 3) join tables
DT2 <- DT[timetable]
# 4) extract rows and columns
DT2[Time >= Time.1, names(DT), with = FALSE]
# City_ID Time Other
# 1: 12 2013-12-29 22:05:03 2
# 2: 9 2013-12-29 11:59:00 4