此功能是瓶颈

Question

我目前有一个运行中的脚本，但遇到性能问题。

我想要的：获取建筑物的高度。

我所拥有的：包含我感兴趣的建筑物的GPS坐标的文件（数百万，可能有/有重复），全部位于荷兰带有荷兰每座建筑物高度的文件。这是一个GML文件，但是我无法使用networkx正确加载它，因此我已经有了xml.etree.ElementTree的变通方法来将其作为XML文件处理。如果愿意，可以here下载。

方法：从GML文件中，我能够为每栋建筑物（约300万栋）建立字典，该建筑物以多边形轮廓坐标为键，高度为值。例如：

dictHeights = {((157838.00015090595, 461662.000273708), (157838.00015090595, 461662.000273708), (157781.32815085226, 461515.93227361), (157781.32815085226, 461515.93227361), (157781.32815085226, 461515.93227361)): 9.41, ...}

我能够使用以下两个自制函数循环遍历所有键并找到高度，这很好用，但是，因为我要处理数百万个点（地址）和多边形（建筑物），严重的性能问题。仅获得5个高度已经花费了几分钟的时间...一种解决方案可能是在使用树结构，但是我看不到如何减少运行时间，因为我要么不得不建造一棵巨大的树，而这需要很多时间时间，或者树很小，然后所有步骤都很耗时。基本上，如果可能的话，我想摆脱大的for循环。

import numpy as np
import matplotlib.path as mpltPath

此功能是瓶颈

def getMedianHeight(dictHeights, points):
    heights = []
    for point in points:
        found = False
        for key, value in dictHeights.items():
            path = mpltPath.Path(key)
            if path.contains_point(point):
                heights.append(value)
                found = True
                break
        if not found:
            heights.append(-999)
    return heights

我不确定如何创建虚拟数据来复制数据。我的主要来源是here。问题是要以适当的方式创建字典。

# random points set of points to test 
N = 1000 #this should be about 4 million
points = zip(np.random.random(N),np.random.random(N))

#this creates a an array of polygons,but they overlap, and don't in my data.
lenpoly = 100 
M = 100 # this should be about 3 million

polygons = tuple([tuple((np.sin(x)+0.5,np.cos(x)+0.5) for x in np.linspace(np.pi,lenpoly)[:-1]) for m in np.random.random(M)])

#create array of virtual heights
heights = 100*np.random.random_sample(M)

以下行导致一个只有1个条目的字典，我已经尝试了大约1000种以所需方式构建字典的方式（所有多边形作为键，高度作为值），但是我做不到... polygons是一个生成器，或者是M个生成器，或者是应该生成的（如现在），但是字典无法正常工作。

dictHeights = dict((polygon, height) for polygon, height in zip(polygons, heights))

result = getMedianHeight(dictHeights, points)

要制作MWE，我将提供一小部分真实数据：

dictHeights = {((151922.594999999, 601062.109999999), (151915.193, 601067.614999998), (151919.848000001, 601073.874000002), (151927.25, 601068.368999999), (151922.594999999, 601062.109999999)): 9.16, ((151229.125999998, 601124.223999999), (151231.934, 601113.313000001), (151225.774, 601111.728), (151222.965999998, 601122.638999999), (151229.125999998, 601124.223999999)): 7.695}
points = [(157838.00015090595, 461662.000273708), (157838.00015090595, 461662.000273708), (157781.32815085226, 461515.93227361), (157781.32815085226, 461515.93227361), (157781.32815085226, 461515.93227361)]
result = getMedianHeight(dictHeights, points)

注意：该MWE的所有点的结果都是-999，因为字典中的多边形不是正确的建筑物，但是您应该得到该点：-）

性能问题：查找点位于哪个多边形中

此功能是瓶颈

0 个答案: