假设datetimes1
是不定期的日期时间,datetimes2
是定期的日期时间。 datetimes1
有一些缺少的日期时间,例如5:10,如第一个表所示。
我想要的是尝试将datetimes1
与datetimes2
匹配,以便每个datetimes1
都接近datetimes2
并且所有datetimes1
似乎都是正确的行。
首先,我尝试将datetimes1
四舍五入到最接近的5分钟并尝试将它们与datetimes2
匹配,但有些日期时间会关闭3分钟,因此它们会四舍五入到不正确的值。
我接下来尝试的是首先将datetimes1
设置为datetimes2
并将datetimes1
与datetimes2
完全匹配,然后将tolerance
设置为0 ,然后在每个循环中开始将tolerance
增加一个,并将尚未匹配的datetimes1
与指定datetimes2
关闭的tolerance
匹配。
这种方法的问题是5:33和5:37两次从5:35开始都是2分钟,所以5:33首先匹配到5:35然后5:37不包含在表。有关使用给定代码获得的结果,请参见第二个表。
你知道如何解决这个问题吗?
我想看到的内容:
datetimes1 datetimes2
1 2014-07-24 05:05:00 2014-07-24 05:05:00
2 <NA> 2014-07-24 05:10:00
3 2014-07-24 05:15:00 2014-07-24 05:15:00
4 2014-07-24 05:23:00 2014-07-24 05:20:00
5 2014-07-24 05:27:00 2014-07-24 05:25:00
6 2014-07-24 05:33:00 2014-07-24 05:30:00
7 2014-07-24 05:37:00 2014-07-24 05:35:00
8 2014-07-24 05:41:00 2014-07-24 05:40:00
9 2014-07-24 05:45:00 2014-07-24 05:45:00
但我得到了这个:
datetimes1 datetimes2
1 2014-07-24 05:05:00 2014-07-24 05:05:00
2 <NA> 2014-07-24 05:10:00
3 2014-07-24 05:15:00 2014-07-24 05:15:00
4 <NA> 2014-07-24 05:20:00
5 2014-07-24 05:23:00 2014-07-24 05:25:00
6 2014-07-24 05:27:00 2014-07-24 05:30:00
7 2014-07-24 05:33:00 2014-07-24 05:35:00
8 2014-07-24 05:41:00 2014-07-24 05:40:00
9 2014-07-24 05:45:00 2014-07-24 05:45:00
这是我的代码:
irregulars <- c("2014-07-24 05:05",
"2014-07-24 05:15",
"2014-07-24 05:23",
"2014-07-24 05:27",
"2014-07-24 05:33",
"2014-07-24 05:37",
"2014-07-24 05:41",
"2014-07-24 05:45")
df1 <- data.frame(datetimes <- as.POSIXct(irregulars, "GMT"))
regulars <- c("2014-07-24 05:05",
"2014-07-24 05:10",
"2014-07-24 05:15",
"2014-07-24 05:20",
"2014-07-24 05:25",
"2014-07-24 05:30",
"2014-07-24 05:35",
"2014-07-24 05:40",
"2014-07-24 05:45")
df2 <- setNames(data.frame(matrix(NA,length(regulars),2)),c("datetimes1","datetimes2"))
df2$datetimes2 <- as.POSIXct(regulars, "GMT")
# Match irregulars to regulars
for(tolerance in c(0:3)) {
for(idx in which(!df1$datetimes %in% df2$datetimes1)) {
dt <- abs(difftime(df2$datetimes2, df1$datetimes[idx], "GMT", "mins"))
dt.min <- min(dt[is.na(df2$datetimes1)])
if (dt.min > tolerance) next
idx2 <- which(dt == dt.min)
df2$datetimes1[idx2] <- df1$datetimes[idx]
}
}
df2$datetimes1 <- as.POSIXct(df2$datetimes1, "GMT", origin = "1970-01-01 00:00:00")
答案 0 :(得分:1)
这是一种方法。假设我们的匹配算法是从一组与目标var dynamicHTML = makeHTML(data1, data2)
return {
title: information
content: dynamicHTML
}
具有最小绝对差异的候选者中找到时间,条件是差异必须低于容差水平(比如5分钟或300秒) ):
x
我们案例中的候选人是“非正规人士”,我们的目标是“常客”。这里的主要思想是迭代“常客”,每当我们从候选人集中找到匹配时,我们就会从候选人中删除它:
closest <- function(x, candidates, tol = 300) {
timediff <- abs(difftime(x, candidates, units = "secs"))
if (all(timediff >= tol)) return(NA)
candidates[which.min(timediff)]
}
这是完整的MWE。首先设置时间向量:
candidates <- irregulars
out <- sapply(regulars, function(x) {
matched <- closest(x, candidates, tol = 300)
candidates <<- setdiff(candidates, matched)
matched
})
定义irregulars <- c("2014-07-24 05:05",
"2014-07-24 05:15",
"2014-07-24 05:23",
"2014-07-24 05:27",
"2014-07-24 05:33",
"2014-07-24 05:37",
"2014-07-24 05:41",
"2014-07-24 05:45")
regulars <- c("2014-07-24 05:05",
"2014-07-24 05:10",
"2014-07-24 05:15",
"2014-07-24 05:20",
"2014-07-24 05:25",
"2014-07-24 05:30",
"2014-07-24 05:35",
"2014-07-24 05:40",
"2014-07-24 05:45")
函数并迭代:
closest
显示输出:
closest <- function(x, candidates, tol = 600) {
timediff <- abs(difftime(x, candidates, units = "secs"))
if (all(timediff >= tol)) return(NA)
candidates[which.min(timediff)]
}
candidates <- irregulars
out <- sapply(regulars, function(x) {
matched <- closest(x, candidates, tol = 300)
candidates <<- setdiff(candidates, matched)
matched
})
答案 1 :(得分:0)
假设不规则时间与常规时间相差5分钟,您可以迭代两个向量,并仅在差异小于5分钟时选择:
i=1
j=1
while(i<=nrow(df2) & j<=nrow(df1))
{
d <-difftime(df2$datetimes2[i], df1$datetimes[j], "GMT",unit="mins")
if (abs(d) < 5) {
df2$datetimes1[i] <- df1$datetimes[j]
j=j+1
i=i+1
} else if(d>0) j=j+1
else i=i+1
}
df2$datetimes1 <- as.POSIXct(df2$datetimes1, "GMT", origin = "1970-01-01 00:00:00")
> df2
datetimes1 datetimes2
1 2014-07-24 05:05:00 2014-07-24 05:05:00
2 <NA> 2014-07-24 05:10:00
3 2014-07-24 05:15:00 2014-07-24 05:15:00
4 2014-07-24 05:23:00 2014-07-24 05:20:00
5 2014-07-24 05:27:00 2014-07-24 05:25:00
6 2014-07-24 05:33:00 2014-07-24 05:30:00
7 2014-07-24 05:37:00 2014-07-24 05:35:00
8 2014-07-24 05:41:00 2014-07-24 05:40:00
9 2014-07-24 05:45:00 2014-07-24 05:45:00