rtree空间索引不适用于大熊猫

时间:2017-12-03 00:51:26

标签: python spatial spatial-query spatial-index r-tree

我正在尝试使用空间索引来加速空间对象的交集,检查点是否在多边形中,如果不是,则从数据帧中丢弃它们。最初它是在没有索引的情况下以费力的方式实现的,例如

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from shapely.geometry import shape, Point
import shapefile
from rtree import index

if __name__ == '__main__':
    # import data from csv, shp
    inst = pd.read_csv('inst.csv',encoding='utf-8')[['lat', 'lng']]
    ht = pd.read_csv('ht.csv',encoding='utf-8')[['lat', 'lng']]

    myshp = open("./shp/pr.shp","rb")
    mydbf = open("./shp/pr.dbf","rb")

    pr = shape(shapefile.Reader(shp=myshp, dbf=mydbf).shapes()[1])

    print(pr.bounds)

    inst_idx = index.Index()
    ht_idx = index.Index()

    # indexing
    for idx,row in inst.iterrows():
        inst_idx.insert(idx,(row['lng'].astype(float),row['lat'].astype(float),row['lng'].astype(float),row['lat'].astype(float)))

    for idx,row in ht.iterrows():
        ht_idx.insert(idx,(row['lng'].astype(float),row['lat'].astype(float),row['lng'].astype(float),row['lat'].astype(float)))

    # intersection and give some possibly false positive result
    inst_nb = list(inst_idx.intersection(pr.bounds))
    ht_nb = list(ht_idx.intersection(pr.bounds))

    # see if they are really in the polygon
    for id in inst_nb:
        if not pr.contains(Point(inst.iloc[id]['lng'].astype(float),inst.iloc[id]['lat'].astype(float))):
            inst.drop(id,inplace=True)

    for id in ht_nb:
        if not pr.contains(Point(ht.iloc[id]['lng'].astype(float),ht.iloc[id]['lat'].astype(float))):
            ht.drop(id,inplace=True)

这很慢。现在我正在尝试制作一个等效的rtree

Traceback (most recent call last):
  File "/Users/Chu/Documents/dssg2018/inst_per_ht1.py", line 40, in <module>
    if not pr.contains(Point(inst.iloc[id]['lng'].astype(float),inst.iloc[id]['lat'].astype(float))):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/indexing.py", line 1373, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/indexing.py", line 1830, in _getitem_axis
    self._is_valid_integer(key, axis)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/indexing.py", line 1713, in _is_valid_integer
    raise IndexError("single positional indexer is out-of-bounds")
IndexError: single positional indexer is out-of-bounds

我希望这会产生与第一个代码块相同的结果。但是,它会保持抛出索引超出范围的错误。

lat

请放心使用列名lon和{{1}}

中的所有示例数据

0 个答案:

没有答案