我有一个数据框,它由通过代码链接在一起的事件组成。每个事件都有一个计数,一个日期和一个时间。我想要一个给定的代码,以找到最接近给定日期和时间的计数。例如,使用以下数据框:
x.df <- structure(list(id = 1:20, code = c(1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), count = c(2L,
3L, 5L, 7L, 8L, 1L, 2L, 7L, 9L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 4L,
4L, 8L, 8L), date = structure(c(1L, 1L, 2L, 2L, 3L, 4L, 4L, 4L,
5L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 8L, 8L, 9L, 9L), .Label = c("2019-01-01",
"2019-01-02", "2019-01-03", "2019-02-11", "2019-02-12", "2019-04-22",
"2019-04-23", "2019-04-24", "2019-04-25"), class = "factor"),
time = structure(c(11L, 12L, 10L, 13L, 14L, 1L, 2L, 5L, 7L,
17L, 19L, 2L, 3L, 9L, 18L, 4L, 6L, 8L, 15L, 16L), .Label = c("01:01:01",
"02:01:02", "02:11:02", "03:01:03", "07:01:07", "09:01:04",
"09:01:09", "10:01:04", "12:01:02", "12:10:01", "12:12:12",
"12:34:23", "13:15:30", "14:19:23", "18:01:08", "19:01:08",
"22:02:12", "23:01:03", "23:02:12"), class = "factor")), class = "data.frame", row.names = c(NA,
-20L))
我想要一个功能
findcount(code,date,time)
如此
findcount(1,"2019-01-02","12:00:00") = 5
findcount(2,"2019-02-02","14:10:23") = 1
findcount(3,"2019-04-29","16:10:00") = 8
我试图对数据进行子集化;对数据进行排序,然后计算一些时差,但是它不起作用。另外,可能有比我正在考虑的方法更有效的方法。谢谢。
答案 0 :(得分:2)
我编写了一个适用于您的示例的函数。首先,我在数据框中创建了一个合并日期和时间的列:
# Create a column that combines the date and time into a single date object
x.df$DateAndTime <- as.POSIXlt(paste(x.df$date, x.df$time))
然后使用以下功能:
findcount <- function(code, date, time, x.df){
# Subset the dataframe to include only dates for the current code
subset <- x.df[x.df$code == code, ]
# Create a date and time object for the input date and time
currentDateAndTime <- as.POSIXlt(paste(date, time))
# Calculate the absolute difference between every date and the current date
differences <- abs(as.numeric(subset$DateAndTime - currentDateAndTime))
return(subset$count[which.min(differences)])
}
对于给定的代码,我可以快速确定与最接近的日期和时间相对应的计数:
findcount(1,"2019-01-02","12:00:00", x.df) = 5
findcount(2,"2019-02-02","14:10:23", x.df) = 1
findcount(3,"2019-04-29","16:10:00", x.df) = 8
请注意,用于将日期和时间组合到单个对象中的格式非常具体(see this description),但是幸运的是,您使用的是无需修改即可使用的格式。
答案 1 :(得分:1)
您可以使用ymd_hms()
中的函数library(lubridate)
,并计算两个日期之间的差异。
示例:
example_code = 1
example_date = "2019-01-02"
example_time = "12:00:00"
x.df %>%
filter(code == example_code) %>%
mutate(hours = paste(date, time) %>% ymd_hms()) %>%
mutate(diff = abs(hours - ymd_hms(paste(example_date, example_time)))) %>%
arrange(diff) %>% print() %>%
# id code count date time hours diff
# 1 3 1 5 2019-01-02 12:10:01 2019-01-02 12:10:01 10.01667 mins
# 2 4 1 7 2019-01-02 13:15:30 2019-01-02 13:15:30 75.50000 mins
# 3 2 1 3 2019-01-01 12:34:23 2019-01-01 12:34:23 1405.61667 mins
# 4 1 1 2 2019-01-01 12:12:12 2019-01-01 12:12:12 1427.80000 mins
# 5 5 1 8 2019-01-03 14:19:23 2019-01-03 14:19:23 1579.38333 mins
.$count %>%
first()
# [1] 5