为什么我的kNN这么慢?深度= 1且k = 1 ...让我减速的原因是什么?

时间:2018-06-20 07:23:12

标签: r for-loop distance knn

每段代码在一个单独的情况下都可以很好地运行...每个都有+ 60k代理的450个时间戳,如果我一个人运行(在for循环之外),则大约需要2秒钟。为什么在for循环中运行它们会花费这么长时间?不应该花450 * 2秒吗? little.my.df有5万行,合格的.df有约6300行。

libary(SearchTrees)
### Make a column to put my result
eligible.df$withinradius <- vector(length = dim(eligible.df)[1])

### For loop selects which rows from little.my.df are in the same
### timestamp [i,1] and are not the same agent [i,3]. 
### There are 450 timestamps.

for (i in 1:dim(eligible.df)[1]){
     timestamp.select <- little.my.df[
                                 which(
                                    little.my.df[,1] == eligible.df[i,1] & 
                                    little.my.df != eligible.df[i,3]),
                                    c(5,4)]
### Create a tree from timestamp.select and find the first NN from i

  test.tree <- createTree(timestamp.select,
                           treeType = 'quad',
                           dataType = 'point',
                           maxDepth = 1)
  test.lookup <- knnLookup(test.tree,
                           newdat = eligible.df[i,c(5,4)],
                           k = 1)
### Calculate the euclidian distance from the first NN and record it in the 
### blank column on the original dataframe.

 eligible.df[i,(dim(eligible.df))[2]] <- dist(matrix(
                                        data = c(eligible.df[i,c(5,4)],
                                        timestamp.select[test.lookup[1,1],]),
                                        ncol = 2, nrow = 2, byrow = TRUE))
       }

对于qualified.df中的每一行,我想在little.my.df中找到第一个最近的邻居(50000行)。 ACTUAL my.df的行数超过一百万,因此我正试图加快速度,但我什至无法使它用于5万行。

0 个答案:

没有答案