每段代码在一个单独的情况下都可以很好地运行...每个都有+ 60k代理的450个时间戳,如果我一个人运行(在for循环之外),则大约需要2秒钟。为什么在for循环中运行它们会花费这么长时间?不应该花450 * 2秒吗? little.my.df有5万行,合格的.df有约6300行。
libary(SearchTrees)
### Make a column to put my result
eligible.df$withinradius <- vector(length = dim(eligible.df)[1])
### For loop selects which rows from little.my.df are in the same
### timestamp [i,1] and are not the same agent [i,3].
### There are 450 timestamps.
for (i in 1:dim(eligible.df)[1]){
timestamp.select <- little.my.df[
which(
little.my.df[,1] == eligible.df[i,1] &
little.my.df != eligible.df[i,3]),
c(5,4)]
### Create a tree from timestamp.select and find the first NN from i
test.tree <- createTree(timestamp.select,
treeType = 'quad',
dataType = 'point',
maxDepth = 1)
test.lookup <- knnLookup(test.tree,
newdat = eligible.df[i,c(5,4)],
k = 1)
### Calculate the euclidian distance from the first NN and record it in the
### blank column on the original dataframe.
eligible.df[i,(dim(eligible.df))[2]] <- dist(matrix(
data = c(eligible.df[i,c(5,4)],
timestamp.select[test.lookup[1,1],]),
ncol = 2, nrow = 2, byrow = TRUE))
}
对于qualified.df中的每一行,我想在little.my.df中找到第一个最近的邻居(50000行)。 ACTUAL my.df的行数超过一百万,因此我正试图加快速度,但我什至无法使它用于5万行。