Question

我有一个文件（位置），它有x，y坐标和日期/时间标识。我想从第二个表（天气）中获取具有“相似”日期/时间变量和共变量（温度和风速）的信息。诀窍是两个表中的日期/时间不完全相同。我想从位置数据中选择最近的天气数据。我知道我需要做一些关于它的循环。

Example location                                    example weather

x    y     date/time                         date/time           temp        wind
1    3     01/02/2003 18:00                  01/01/2003 13:00     12          15
2    3     01/02/2003 19:00                  01/02/2003 16:34     10          16
3    4     01/03/2003 23:00                  01/02/2003 20:55     14          22
2    5     01/04/2003 02:00                  01/02/2003 21:33     14          22
                                             01/03/2003 00:22     13          19
                                             01/03/2003 14:55     12          12
                                             01/03/2003 18:00     10          12
                                             01/03/2003 23:44     2           33
                                             01/04/2003 01:55     6           22

因此，最终输出将是具有与位置数据正确“最佳”匹配的天气数据的表

x    y     datetime               datetime           temp        wind
1    3     01/02/2003 18:00  ----  01/02/2003 16:34     10          16
2    3     01/02/2003 19:00  ----  01/02/2003 20:55     14          22
3    4     01/03/2003 23:00  ----  01/03/2003 00:22     13          19               
2    5     01/04/2003 02:00  ----  01/04/2003 01:55     6           22

有什么建议可以从哪里开始？我想在R

中这样做

Answer 1

一种快捷方式可能是使用data.table。如果使用键创建两个data.table的X和Y，则语法为：

X[Y,roll=TRUE]

我们称之为滚动连接，因为我们在X向前滚动主要观察以匹配Y中的行。请参阅？data.table中的示例和引入插图。

另一种方法是使用locf（最后一次观察结果）的动物园包，也可能是其他包。

我不确定你的位置或时间是否最接近。如果位置，并且该位置是x，y坐标，那么我猜你需要在2D空间中进行一些距离测量。 data.table只有单变量'最接近'，例如按时间。第二次阅读你的问题看起来你的意思似乎与普遍意义上最接近。

编辑：现在看到示例数据。 data.table不会在一个步骤中执行此操作，因为虽然它可以向前或向后滚动，但它不会滚动到最近。您可以使用= TRUE的额外步骤来执行此操作，然后测试流行后的那个是否真的更接近。

Answer 2

我需要将数据分别作为数据和时间，然后粘贴并格式化

location$dt.time <- as.POSIXct(paste(location$date, location$time), 
                                 format="%m/%d/%Y %H:%M")

weather

也一样

然后，对于location中date.time的每个值，找到weather中具有最小时间差绝对值的条目：

 sapply(location$dt.time, function(x) which.min(abs(difftime(x, weather$dt.time))))
# [1] 2 3 8 9
 cbind(location, weather[ sapply(location$dt.time, 
                      function(x) which.min(abs(difftime(x, weather$dt.time)))), ])

  x y       date  time             dt.time       date  time temp wind             dt.time
2 1 3 01/02/2003 18:00 2003-01-02 18:00:00 01/02/2003 16:34   10   16 2003-01-02 16:34:00
3 2 3 01/02/2003 19:00 2003-01-02 19:00:00 01/02/2003 20:55   14   22 2003-01-02 20:55:00
8 3 4 01/03/2003 23:00 2003-01-03 23:00:00 01/03/2003 23:44    2   33 2003-01-03 23:44:00
9 2 5 01/04/2003 02:00 2003-01-04 02:00:00 01/04/2003 01:55    6   22 2003-01-04 01:55:00

 cbind(location, weather[ 
                  sapply(location$dt.time, 
                    function(x) which.min(abs(difftime(x, weather$dt.time)))), ])[ #pick columns
                          c(1,2,5,8,9,10)]

  x y             dt.time temp wind           dt.time.1
2 1 3 2003-01-02 18:00:00   10   16 2003-01-02 16:34:00
3 2 3 2003-01-02 19:00:00   14   22 2003-01-02 20:55:00
8 3 4 2003-01-03 23:00:00    2   33 2003-01-03 23:44:00
9 2 5 2003-01-04 02:00:00    6   22 2003-01-04 01:55:00

我的回答似乎与你的回答有点不同，但是另一位读者已经质疑你手工正确匹配的能力。

根据R中的日期/时间范围连接数据

2 个答案: