我有一个带点的数据框。前两列是位置。我基于与另一个点的接近点来过滤数据。我用cdist计算所有点的距离,然后过滤这个结果,找到彼此距离小于0.5的点的索引。我还必须首先对这些索引做两个迷你过滤器去除删除索引以比较相同的点距离[n,n] =距离[n,n]总是等于零,我不想删除所有的点。我也删除了类似距离比较距离[n,m] =距离[m,n]的indeces。我需要删除的点数基本上是两倍,所以我使用unique来过滤掉一半。
我的问题loc_find
是一个应该删除的行的索引数组。如何删除使用此数组从我的pandas数据帧中删除这些编号的行而不迭代数据帧?
from scipy.spatial.distance import cdist
import numpy as np
import pandas as pd
# make points and calculate distances
east=data['easting'].values
north=data['northing'].values
points=np.vstack((east,north)).T
distances=cdist(points,points) # big row x row matrix
zzzz=np.where(distances<0.5)
loc_dist=np.vstack((zzzz[0],zzzz[1])).T #array of indices where points are
# to close together and will be filtered contains unwanted distance
# comparisons such as comparing data[1,1] with data[1,1] which is always zero
#since it is the same point. also distance [1,2] is same as [2,1]
#My code for filtering the indices
loc_dist=loc_dist.astype('int')
diff_loc=zzzz[0]-zzzz[1] # remove indices for comparing the same
#point distance [n,n] = distance [n,n]
diff_zero=np.where(diff_loc==0)
loc_dist_s=np.delete(loc_dist, diff_zero[0],axis=0)
loc_find=np.unique(loc_dist_s) # remove indices for similar distance
#comparisons distance [n,m] = distance [m,n]
答案 0 :(得分:0)
感谢@EdChum,我发现这两个回答的问题对我有用。
A faster alternative to Pandas `isin` function
Select rows from a DataFrame based on values in a column in pandas
只需将数据框索引转换为
列data.loc[:,'rindex1']=data.index.get_values()
然后删除行使用以下
data_df2=data.loc[~data['rindex1'].isin(loc_find)]
希望这有助于其他人。