Question

我编写了一个与小尺寸数据完美配合的代码，但是当我在具有52000个功能的数据集上运行时，它似乎停留在下面的函数中：

def extract_neighboring_OSM_nodes(ref_nodes,cor_nodes):
    time_start=time.time()
    print "here we start finding neighbors at ", time_start
    for ref_node in ref_nodes:
        buffered_node = ref_node[2].buffer(10)
        for cor_node in cor_nodes:
            if cor_node[2].within(buffered_node):
                ref_node[4].append(cor_node[0])
                cor_node[4].append(ref_node[0])
    #        node[4][:] = [cor_nodes.index(x) for x in cor_nodes if x[2].within(buffered_node)]
    time_end=time.time()
    print "neighbor extraction took ", time_end
    return ref_nodes

ref_node和cor_node是元组列表，如下所示： [（FID，点，几何，链接，邻居）] neighbor是一个空列表，将在上面的函数中填充。

正如我所说，打印出的最后一条消息是此函数中的第一个打印命令。似乎这个功能是如此之慢，但对于52000千个功能它不应该花24小时，不是吗？问题是什么或如何使功能更快？

Answer 1

您可以尝试多处理，这是一个示例 - http://pythongisandstuff.wordpress.com/2013/07/31/using-arcpy-with-multiprocessing-%E2%80%93-part-3/。

Answer 2

如果您想获得数据集或eps邻域样本的每个（或某些，无关紧要）样本的K Nearest Neighbors，则无需自行实现。有专门为此目的的图书馆。

一旦他们构建了数据结构（通常是某种树），您就可以查询某个样本邻域的数据。通常对于高维数据，这些数据结构不如低维度那样好，但也有高维数据的解决方案。

我可以推荐的是KDTree Scipy implementation。

我希望你发现它像我一样有用。

在python中使用Arcpy的两个for循环的非常慢的函数

2 个答案: