Question

我有一个geoJSON文件，其中包含某个地理区域的细分为ca. 7000个细胞。我想a）打开这个geoJSON b）修改一些数据（参见下面的代码）和c）将这个修改过的geoJSON写入磁盘。现在，我的问题是，由于有很多单元格，这需要将近一分钟。你有没有办法提高这个功能的速度？谢谢！

def writeGeoJSON(param1, param2, inputdf):
    with open('ingeo.geojson') as f:
        data = json.load(f)
    for feature in data['features']: 
        currentfeature = inputdf[(inputdf['SId']==feature['properties']['cellId']) & (inputdf['param1']==param1) & (inputdf['param2']==param2)]
        if (len(currentfeature) > 0):
            feature['properties'].update({"style": {"opacity": currentfeature.Opacity.item()}})
        else:
            feature['properties'].update({"style": {"opacity": 0}})
    end = time.time()
    with open('outgeo.geojson', 'w') as outfile:
        json.dump(data, outfile)

Answer 1

您的代码中可以进行串行代码优化。你有这条线：

currentfeature = inputdf[(inputdf['SId']==feature['properties']['cellId']) & (inputdf['param1']==param1) & (inputdf['param2']==param2

请注意，最后两项检查可以放在for循环之外。这是一个冗余检查，它会占用for循环中每次迭代的许多CPU时钟周期！您可以修改为：

paramMatch=inputdf['param1']==param1 & inputdf['param2']==param2
for feature in data['features']: 
    currentfeature = inputdf[(inputdf['SId']==feature['properties']['cellId']) & paramMatch]

这必须让你的程序运行得更快！

也就是说，如果您需要更好的执行时间（很可能没有必要），请尝试使用multiprocessing模块并行化代码的处理部分。您可以尝试在for循环中分割工作量。

尝试使用apply_async或map_async进行一系列迭代以加快速度！

Answer 2

[除了@varun优化，还包括@ romain-aga建议。]

在函数开头添加：

zero_style = {"opacity": 0}

并将条件更改为：

if (len(currentfeature) > 0):
    feature['properties']['style'] = {"opacity": currentfeature.Opacity.item()}
else:
    feature['properties']['style'] = zero_style

我的印象是，了解inputdf类型的更多内容会带来更好的优化（也许直接if currentfeature:就够了？也许更好？）

假设CPython，我希望这会更好（更好地要求宽恕而不是许可）：

try:
    value = {"opacity": currentfeature.Opacity.item()}
except NotSureWhatExceptionMaybeAttributeError:
    value = zero_style
feature['properties']['style'] = value

Python - 提高读/修改/写入速度？

2 个答案: