我有两个时间戳数据A和B.
A = c("2015-11-02 08:30:00.054", "2015-11-02 08:30:00.060", "2015-11-02 08:30:00.060", "2015-11-02 08:30:00.062", "2015-11-02 08:30:00.952")
B = c("2015-11-02 08:30:00.016", "2015-11-02 08:30:00.029", "2015-11-02 08:30:00.030", "2015-11-02 08:30:00.045", "2015-11-02 08:30:00.048", "2015-11-02 08:30:00.054", "2015-11-02 08:30:00.056", "2015-11-02 08:30:00.078", "2015-11-02 08:30:00.079", "2015-11-02 08:30:00.079", "2015-11-02 08:30:00.246", "2015-11-02 08:30:00.247", "2015-11-02 08:30:00.251", "2015-11-02 08:30:00.251", "2015-11-02 08:30:00.252")
我将每次将向量A和B中的元素表示为i和j。所以在这种情况下,$ 1,{1,...,5} $和$ B \ in {1,...,15} $。期望的输出矩阵Z是i×j矩阵,在这种情况下是5乘15。
我想检查对于i和j的每个间隔,如果时间间隔重叠以及它们是否在Z_ {i,j}中记录1。所以,如果[A_i,A_ {i + 1}]和[B_j,B_ {j + 1}]时间间隔完全重叠,我会记录1。
例如,对于i = 1和j = 1,[" 2015-11-02 08:30:00.054"," 2015-11-02 08:30:00.060& #34;]和[" 2015-11-02 08:30:00.016"," 2015-11-02 08:30:00.029"]不重叠,所以Z_ {1,1}的输出为0。
我现在尝试的是每个i和j,我遍历整个向量并在lubridate包中使用as.interval和int_overlaps函数。但是,如果可能的话,我想要一个解决这个问题的矢量化解决方案。它非常慢,因为我的A和B向量通常包含超过10,000个变量。
我尝试使用以下代码生成价格乘以i乘以jj矩阵,这非常低效:
r1 = as.interval( strptime(as.POSIXlt((A), format = "%Y-%m-%d %H:%M:%OS"), format = "%Y-%m-%d %H:%M:%OS")[ii], strptime(as.POSIXlt((A), format = "%Y-%m-%d %H:%M:%OS"), format = "%Y-%m-%d %H:%M:%OS")[ii+1])
r1 = as.interval( strptime(as.POSIXlt((B), format = "%Y-%m-%d %H:%M:%OS"), format = "%Y-%m-%d %H:%M:%OS")[jj], strptime(as.POSIXlt((B), format = "%Y-%m-%d %H:%M:%OS"), format = "%Y-%m-%d %H:%M:%OS")[jj+1])
int_overlaps(r1,r2)
总而言之,我现在有以下内容。它有效,但速度非常慢。
tch = function(time_vector){
strptime(as.POSIXlt(as.character(time_vector), format = "%Y-%m-%d %H:%M:%OS",
tzone = "CT"), format = "%Y-%m-%d %H:%M:%OS")}
Z = matrix(0, length(A)-1, length(B)-1)
for (ii in 1:(length(A)-1)){
for (jj in 1:(length(B)-1)){
r1 = as.interval(tch(A)[ii], tch(A)[ii+1])
r2 = as.interval(tch(B)[jj], tch(B)[jj+1])
# save all overlaps and compute all the vectors
if( int_overlaps(r1,r2) ){
Z[ii,jj] = 1
}
if (jj > 1){
if (Z[ii,jj] == 0 & Z[ii,(jj-1)] == 1){
break
}}
}
print(paste(jj,ii))}
任何帮助将不胜感激!