不明白,IndexError:数组

时间:2017-05-08 03:49:28

标签: python filter distance

如果点之间的距离在特定距离(5公里或10公里或30公里)内,我的任务是移除经度和纬度坐标。这用于建模目的并避免聚类。我正在使用半正弦方程来测量距离。

以下是我的初始代码:

load the geometry record from points,  
then convert it to an array, 
compare each coordinate pairs and measure distance. 
After that, remove the longitude and latitude pairs that are   
close to each other, 

但是却坚持这一步。

我计划的是更新坐标对的项目列表,并使用新的坐标对再次迭代。

运行以下脚本会给我这个错误:

  

IndexError:数组索引太多

似乎迭代中的索引没有更新。它仍然在第一次通过时获得索引。

import math, easygui, shapefile, itertools, os
import pandas as pd
import numpy as np

filepath = easygui.fileopenbox()

input_dist = int(raw_input("Distance Filter Value?: "))
input_crop = raw_input("what crop?: ")

directory = os.path.split(filepath)[0]

def dist_haversine(shp,input_dist,input_crop):
    """
    Calculate the great circle distance between two points 
    on the earth (specified in decimal degrees)
    """

    r = shapefile.Reader(shp)
    idx = np.arange(len(r.records()))
    coordinates = []
    for i in idx:
        geom = r.shape(i)
        coordinates.append(geom.points[0])    

    acoords = np.array(coordinates)

    for r,n in itertools.izip(acoords[:,0],acoords[:,1]):

        coordinates_ = []

        for i,j in itertools.izip(acoords[:,0],acoords[:,1]):

            lon1=r
            lat1=n
            lon2=i
            lat2=j

            lon1, lat1, lon2, lat2 = map(math.radians, [lon1, lat1, lon2, lat2])

            # haversine formula
            dlon = lon2 - lon1 
            dlat = lat2 - lat1 
            a = math.sin(dlat/2)**2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon/2)**2
            c = 2 * math.asin(math.sqrt(a)) 
            km = c*6371 #/1000.0

            if km > input_dist:
                coordinates_.append([i,j])

        coordinates[:] = coordinates_
        acoords = np.array(coordinates)

    df_coords_ = pd.DataFrame(coordinates).drop_duplicates().values
    df_coords = pd.DataFrame(df_coords_, columns=['Lon','Lat'])

    df_coords.insert(0, 'Crop', input_crop)  

    return df_coords.to_csv(os.path.split(directory)[0] + "\\" + "%s_distFilter_%skm.csv" % (input_crop, input_dist), sep=",", index=None)
  

追溯

     
File "<ipython-input-3-8e88eba2ab54>", line 1,  in <module>  
  dist_haversine(filepath,input_dist,input_crop)
File "<ipython-input-2-d43a1f1da26a>", line 20, in 
  dist_haversine  
    for i,j in itertools.izip(acoords[:,0],acoords[:,1]):  
IndexError: too many indices for array
  

1 个答案:

答案 0 :(得分:0)

这是我用来过滤积分的初始解决方案。它的工作,对于1,000 - 3,000点数据集来说有点快。然而,尝试过滤50,000点,完成需要2.5 - 3小时。

def dist_haversine(filepath,input_dist,input_crop):
    """
    Calculate the great circle distance between two points 
    on the earth (specified in decimal degrees)
    """

    r = shapefile.Reader(filepath)
    idx = np.arange(len(r.records()))
    coordinates = []
    for i in idx:
        geom = r.shape(i)
        coordinates.append(geom.points[0])       

    acoords = np.array(coordinates)

    index = []        
    for r,n,l in itertools.izip(acoords[:,0],acoords[:,1],idx):
        if l in index:
            continue
        else:
            for i,j,k in itertools.izip(acoords[:,0],acoords[:,1], idx):
                if k in index:
                    continue

                else:

                    lon1=r
                    lat1=n
                    lon2=i
                    lat2=j

                    coord_check = ((lon1 == lon2) & (lat1 == lat2))*1

                    if coord_check == 1:
                        continue

                    else:
                        lon1, lat1, lon2, lat2 = map(math.radians, [lon1, lat1, lon2, lat2])

                        # haversine formula
                        dlon = lon2 - lon1 
                        dlat = lat2 - lat1 
                        a = math.sin(dlat/2)**2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon/2)**2
                        c = 2 * math.asin(math.sqrt(a)) 
                        km = c*6371 #/1000.0

                    if km < input_dist:
                        if k in index:
                            continue
                        else:
                            index.append(k)

    filterList = [i for j, i in enumerate(coordinates) if j not in index]

    df_coords = pd.DataFrame(filterList, columns=['Lon','Lat'])

    df_coords.insert(0, 'Crop', input_crop)  

    return df_coords.to_csv(directory + "\\" + "%s_distFilter_%skm.csv" % (input_crop, input_dist), sep=",", index=None)