我正在尝试使用空间索引来加速空间对象的交集,检查点是否在多边形中,如果不是,则从数据帧中丢弃它们。最初它是在没有索引的情况下以费力的方式实现的,例如
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from shapely.geometry import shape, Point
import shapefile
from rtree import index
if __name__ == '__main__':
# import data from csv, shp
inst = pd.read_csv('inst.csv',encoding='utf-8')[['lat', 'lng']]
ht = pd.read_csv('ht.csv',encoding='utf-8')[['lat', 'lng']]
myshp = open("./shp/pr.shp","rb")
mydbf = open("./shp/pr.dbf","rb")
pr = shape(shapefile.Reader(shp=myshp, dbf=mydbf).shapes()[1])
print(pr.bounds)
inst_idx = index.Index()
ht_idx = index.Index()
# indexing
for idx,row in inst.iterrows():
inst_idx.insert(idx,(row['lng'].astype(float),row['lat'].astype(float),row['lng'].astype(float),row['lat'].astype(float)))
for idx,row in ht.iterrows():
ht_idx.insert(idx,(row['lng'].astype(float),row['lat'].astype(float),row['lng'].astype(float),row['lat'].astype(float)))
# intersection and give some possibly false positive result
inst_nb = list(inst_idx.intersection(pr.bounds))
ht_nb = list(ht_idx.intersection(pr.bounds))
# see if they are really in the polygon
for id in inst_nb:
if not pr.contains(Point(inst.iloc[id]['lng'].astype(float),inst.iloc[id]['lat'].astype(float))):
inst.drop(id,inplace=True)
for id in ht_nb:
if not pr.contains(Point(ht.iloc[id]['lng'].astype(float),ht.iloc[id]['lat'].astype(float))):
ht.drop(id,inplace=True)
这很慢。现在我正在尝试制作一个等效的rtree
Traceback (most recent call last):
File "/Users/Chu/Documents/dssg2018/inst_per_ht1.py", line 40, in <module>
if not pr.contains(Point(inst.iloc[id]['lng'].astype(float),inst.iloc[id]['lat'].astype(float))):
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/indexing.py", line 1373, in __getitem__
return self._getitem_axis(maybe_callable, axis=axis)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/indexing.py", line 1830, in _getitem_axis
self._is_valid_integer(key, axis)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/indexing.py", line 1713, in _is_valid_integer
raise IndexError("single positional indexer is out-of-bounds")
IndexError: single positional indexer is out-of-bounds
我希望这会产生与第一个代码块相同的结果。但是,它会保持抛出索引超出范围的错误。
lat
请放心使用列名lon
和{{1}}