我正在尝试确定日期时间是否在开始日期时间和结束日期时间之间,以及它是否返回与此匹配的值。它在data.table中工作,但希望在DPLYR中使用它。
所以如果你有日期时间:
2017-07-01 02:15:00
2017-07-01 02:30:00
在第二张表中查看这些内容
begin, end, value1, value2
2017-07-01 00:01:00, 2017-07-01 01:00:00, 1, 2
2017-07-01 01:01:00, 2017-07-01 02:00:00, 3, 4
2017-07-01 02:01:00, 2017-07-01 03:00:00, 5, 6
返回
date value1 value2
2017-07-01 02:15:00 5 6
2017-07-01 02:30:00 5 6
有许多查找值,因此它将是几百个查找日期时间。
我使用data.table但希望使用DPLYR来减少对许多包的依赖。这就是我到目前为止所做的:
library(tidyverse)
library(lubridate)
library(data.table)
dates <- read_csv("date1.csv") %>%
mutate(date = as_datetime(date))
lookup <- read_csv("lookup.csv") %>%
mutate(begin = as_datetime(begin),
end = as_datetime(end))
dates <- data.table(dates)
lookup <- data.table(lookup)
setkey(lookup, begin, end)
dates[, c("begin", "end") := date]
test.df <- foverlaps(dates, lookup)[, c("date", "value1", "value2"),
with = FALSE]
我在考虑使用类似的东西:
test <- dates %>% rowwise() %>%
mutate(value1 = ifelse( lookup$begin >= date & date <= lookup$end, lookup$value1, ""))
以下是查询日期:
dates <- structure(list(date = structure(c(1498867200, 1498868100, 1498869000,
1498869900, 1498870800, 1498871700, 1498872600, 1498873500, 1498874400,
1498875300, 1498876200, 1498877100, 1498878000, 1498878900, 1498879800,
1498880700, 1498881600, 1498882500), tzone = "UTC", class = c("POSIXct",
"POSIXt"))), .Names = "date", class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -18L))
查找表:
lookup <- structure(list(begin = structure(c(1498867260, 1498870860, 1498874460,
1498878060, 1498881660, 1498885260, 1498888860, 1498892460, 1498896060
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), end = structure(c(1498870800,
1498874400, 1498878000, 1498881600, 1498885200, 1498888800, 1498892400,
1498896000, 1498899600), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
value1 = c(1L, 3L, 5L, 7L, 9L, 11L, 13L, 15L, 17L), value2 = c(2L,
4L, 6L, 8L, 10L, 12L, 14L, 16L, 18L)), .Names = c("begin",
"end", "value1", "value2"), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -9L))
答案 0 :(得分:0)
您可以尝试以下操作:
library(tidyverse)
library(lubridate)
dates <- dates %>%
mutate(match_date = format(date, "%Y-%m-%d"),
match_hour = hour(date - minutes(1)))
lookup <- lookup %>%
mutate(match_date = format(begin, "%Y-%m-%d"),
match_hour = hour(begin))
left_join(dates, lookup, by = c("match_date", "match_hour")) %>%
filter(date >= begin & date <= end) %>%
select(- match_date, - match_hour) %>%
head()
# A tibble: 6 x 5
# date begin end value1 value2
# <dttm> <dttm> <dttm> <int> <int>
# 1 2017-07-01 00:15:00 2017-07-01 00:01:00 2017-07-01 01:00:00 1 2
# 2 2017-07-01 00:30:00 2017-07-01 00:01:00 2017-07-01 01:00:00 1 2
# 3 2017-07-01 00:45:00 2017-07-01 00:01:00 2017-07-01 01:00:00 1 2
# 4 2017-07-01 01:00:00 2017-07-01 00:01:00 2017-07-01 01:00:00 1 2
# 5 2017-07-01 01:15:00 2017-07-01 01:01:00 2017-07-01 02:00:00 3 4
# 6 2017-07-01 01:30:00 2017-07-01 01:01:00 2017-07-01 02:00:00 3 4
首先,我提取当天的日期和小时以匹配。我从dates
- 表中的日期减去一分钟,因为您lookup
- 表中的结束时间包含时间清晰(我的意思是例如01:00:00)。由于我想在开始日期加入以获得正确的匹配小时(例如在这种情况下为0),我减去分钟。
然后我根据您所需的标准left_join
dates
和lookup
以及filter
。{/ p>