如果第二个表中两个日期之间的日期返回值

时间:2017-12-08 20:45:50

标签: r dplyr data.table lubridate

我正在尝试确定日期时间是否在开始日期时间和结束日期时间之间,以及它是否返回与此匹配的值。它在data.table中工作,但希望在DPLYR中使用它。

所以如果你有日期时间:

2017-07-01 02:15:00 
2017-07-01 02:30:00

在第二张表中查看这些内容

begin,      end,                           value1,  value2
2017-07-01 00:01:00,  2017-07-01 01:00:00,  1,       2
2017-07-01 01:01:00,  2017-07-01 02:00:00,  3,       4
2017-07-01 02:01:00,  2017-07-01 03:00:00,  5,       6

返回

date                value1   value2
2017-07-01 02:15:00    5        6     
2017-07-01 02:30:00    5        6  

有许多查找值,因此它将是几百个查找日期时间。

我使用data.table但希望使用DPLYR来减少对许多包的依赖。这就是我到目前为止所做的:

library(tidyverse)
library(lubridate)
library(data.table)

dates <- read_csv("date1.csv") %>% 
  mutate(date = as_datetime(date))

lookup <- read_csv("lookup.csv") %>% 
  mutate(begin = as_datetime(begin),
         end = as_datetime(end))

dates <- data.table(dates)
lookup <- data.table(lookup)
setkey(lookup, begin, end)
dates[, c("begin", "end") := date]  
test.df <- foverlaps(dates, lookup)[, c("date", "value1", "value2"), 
                                        with = FALSE] 

我在考虑使用类似的东西:

test <- dates %>% rowwise() %>%
  mutate(value1 = ifelse( lookup$begin >= date & date <= lookup$end, lookup$value1, ""))

以下是查询日期:

    dates <- structure(list(date = structure(c(1498867200, 1498868100, 1498869000, 
1498869900, 1498870800, 1498871700, 1498872600, 1498873500, 1498874400, 
1498875300, 1498876200, 1498877100, 1498878000, 1498878900, 1498879800, 
1498880700, 1498881600, 1498882500), tzone = "UTC", class = c("POSIXct", 
"POSIXt"))), .Names = "date", class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -18L))

查找表:

    lookup <- structure(list(begin = structure(c(1498867260, 1498870860, 1498874460, 
1498878060, 1498881660, 1498885260, 1498888860, 1498892460, 1498896060
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), end = structure(c(1498870800, 
1498874400, 1498878000, 1498881600, 1498885200, 1498888800, 1498892400, 
1498896000, 1498899600), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    value1 = c(1L, 3L, 5L, 7L, 9L, 11L, 13L, 15L, 17L), value2 = c(2L, 
    4L, 6L, 8L, 10L, 12L, 14L, 16L, 18L)), .Names = c("begin", 
"end", "value1", "value2"), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -9L))

1 个答案:

答案 0 :(得分:0)

您可以尝试以下操作:

library(tidyverse)
library(lubridate)

dates <- dates %>% 
  mutate(match_date = format(date, "%Y-%m-%d"), 
         match_hour = hour(date - minutes(1)))

lookup <- lookup %>% 
  mutate(match_date = format(begin, "%Y-%m-%d"), 
         match_hour = hour(begin))


left_join(dates, lookup, by = c("match_date", "match_hour")) %>% 
  filter(date >= begin & date <= end) %>% 
  select(- match_date, - match_hour) %>% 
  head()

# A tibble: 6 x 5
#                  date               begin                 end value1 value2
#                <dttm>              <dttm>              <dttm>  <int>  <int>
# 1 2017-07-01 00:15:00 2017-07-01 00:01:00 2017-07-01 01:00:00      1      2
# 2 2017-07-01 00:30:00 2017-07-01 00:01:00 2017-07-01 01:00:00      1      2
# 3 2017-07-01 00:45:00 2017-07-01 00:01:00 2017-07-01 01:00:00      1      2
# 4 2017-07-01 01:00:00 2017-07-01 00:01:00 2017-07-01 01:00:00      1      2
# 5 2017-07-01 01:15:00 2017-07-01 01:01:00 2017-07-01 02:00:00      3      4
# 6 2017-07-01 01:30:00 2017-07-01 01:01:00 2017-07-01 02:00:00      3      4

首先,我提取当天的日期和小时以匹配。我从dates - 表中的日期减去一分钟,因为您lookup - 表中的结束时间包含时间清晰(我的意思是例如01:00:00)。由于我想在开始日期加入以获得正确的匹配小时(例如在这种情况下为0),我减去分钟。

然后我根据您所需的标准left_join dateslookup以及filter。{/ p>