Question

我有一个商店位置的纬度/经度坐标的 csv 文件。我还有一个单独的 geojson 区域文件。

我想循环遍历位置列表，看看坐标是否存在于geojson文件的多边形中。如果是这样，我想从此多边形获取特定信息并使用更新信息创建一个新文件。

我使用 Shapely库编写 Python 代码来执行此操作，但geojson文件为3 + GB。我运行它，几个小时后它甚至没有将第一行标题写入新文件。

然而，当我将文件更改为较小的文件（只有一个状态而不是整个美国，大小为5 MB）时，它只需要大约一个小时的时间运行，并且标题几乎立即被写入（因为它＆＃39;太快了。但对于Python来说，这仍然感觉太长了。

我想知道我的循环或逻辑是否有问题导致代码运行太多次，或者我如何让程序运行得更快以便我可以尽快获得这个新文件

这是一个代码示例（我循环遍历csv文件的每一行，将lat和long存储为一个点，然后循环遍历geojson功能并使用Shapely函数查看是否有点存在于特征中）：

    import json
import csv
import datetime

from shapely.geometry import shape, Point 

mapFile = 'censusFile.geojson'

locations = open('locations.csv')
locationsFile =csv.reader(locationCsvFile)

csvFile = open('new_locations_csv','ab')
newFile =csv.writer(csvFile)


firstRow = True

for row in locationsFile:
    foundPoint =False
    #check if first row is true, then add columns
    if firstRow:
        row.append("x")
        row.append("y")
        row.append("z")
        newFile.writerow(row)
        #set first row to false
        firstRow=False
        continue

    #get point coordinates from file
    pointLat = float(row[13])
    pointLon = float(row[14])

    point = Point(pointLon, pointLat)


    with open(mapFile) as f:
        data = json.load(f)

        #Loop through features in Geojson file
        for feature in data['features']:
            if foundPoint:
                break
            polygon = shape(feature["geometry"])
            if polygon.contains(point):
                #change variable
                foundPoint = True

                #Grab data
                newx = feature["properties"]["x"]
                newy = feature["properties"]["y"]
                newz = feature["properties"]["z"]
                #append data
                row.append(newx)
                row.append(newy)
                row.append(newz)
                #update file
                newFile.writerow(row)



locationCsvFile.close()

csvFile.close()

Python需要花费数小时才能遍历大文件吗？

0 个答案: