减少DTW距离功能的运行时间

时间:2017-05-25 09:44:59

标签: r distance

我想计算数据矩阵列之间的DTW距离。但目前的实施需要非常长的时间。是否有其他dtw的实现需要更少的时间?

这是虚拟数据:

df <- data.frame(d1= rnorm(1500,10,5),d2= rnorm(1500,130,10),d3= rnorm(1500,200,10),d4= rnorm(1500,120,15),d5= rnorm(1500,700,25),d6= rnorm(1500,6,2),d7= rnorm(1500,760,15),d8= rnorm(1500,3000,08),d9= rnorm(1500,490,15),d10= rnorm(1500,321,21))

此函数使用DTWDistance()函数返回距离矩阵:

compute_dtw_distance_matrix  <- function(data_mat){
  library(TSdist) # for DTWDistance function
  cols = dim(data_mat)[2] # no. of columns or features
  dis_mat = matrix(0,nrow=cols,ncol=cols) # create result matrix
  # Here, I will compute only lower triangular matrix, later I will copy values to full matrix.
  # compute only lower traingular matrix
  for(row in 1:cols){
    ref_col = data_mat[,row]
    for(col in 1:row){
      comp_col = data_mat[,col]
      dis_mat[row,col] = DTWDistance(ref_col, comp_col)
    }
  }
  # convert lower_triangular to full_symmetric matrix
  for(i in 1:NROW(dis_mat)){
    for(j in 1:NCOL(dis_mat)){
      dis_mat[i,j] = dis_mat[j,i] 
    }
  }
  colnames(dis_mat) <- colnames(data_mat)
  row.names(dis_mat) <- colnames(data_mat)
  return(dis_mat)
}

以下是我的机器上此功能的运行时间统计信息:

 system.time(compute_dtw_distance_matrix(df))
       user  system elapsed 
     21.500   3.049  24.723 

是否可以减少此功能的运行时间?

2 个答案:

答案 0 :(得分:1)

您可以使用支持并行计算多维动态时间扭曲距离的parallelDist包。

parDist函数当前采用矩阵列表作为处理多维数据集的输入参数。

matrices.list <- lapply(as.list(df), function(x) t(as.matrix(x)))
使用以下参数,

parDist使用compute_dtw_distance_matrix函数在〜 0.5s 中生成相同的输出,使用8个线程与 18.44s 生成相同的输出:

res1 <- compute_dtw_distance_matrix(df)
res2 <- parDist(matrices.list, method = "dtw", step.pattern="symmetric2", window.type="none", upper = T, diag = T, threads = 8)
all.equal(as.matrix(res1), as.matrix(res2))

这是一个包含不同线程数的微基准测试。

    expr        min         lq
                                                                                                  compute_dtw_distance_matrix(df) 17.9424328 18.1571842
 parDist(matrices.list, method = "dtw", step.pattern = "symmetric2",      window.type = "none", upper = T, diag = T, threads = 8)  0.5280135  0.5434037
 parDist(matrices.list, method = "dtw", step.pattern = "symmetric2",      window.type = "none", upper = T, diag = T, threads = 4)  0.6869948  0.6999783
 parDist(matrices.list, method = "dtw", step.pattern = "symmetric2",      window.type = "none", upper = T, diag = T, threads = 2)  1.0311007  1.0646326
 parDist(matrices.list, method = "dtw", step.pattern = "symmetric2",      window.type = "none", upper = T, diag = T, threads = 1)  1.6967269  1.7057925
       mean     median         uq        max neval
 18.4489183 18.4901471 18.6747947 18.9852819    10
  0.5547146  0.5568046  0.5657859  0.5727592    10
  0.7266116  0.7276621  0.7446920  0.7597008    10
  1.0796176  1.0742217  1.0812031  1.1792582    10
  1.7358018  1.7148310  1.7695766  1.8238875    10

答案 1 :(得分:1)

我知道这是一个老问题,但我一直在寻找加速R中距离矩阵计算的方法。我遇到了RcppParallel包,它可以与几个距离函数一起使用来计算距离。 详情请见https://cran.r-project.org/web/packages/dtwclust/vignettes/parallelization-considerations.html