根据相邻坐标来操纵熊猫数据框

时间:2019-11-30 17:03:33

标签: python pandas dataframe coordinates

我有以下数据帧显示纬度(y),经度(x)和高程(z)。负数表示没有海拔,即没有土地。

我要更改当前没有高程的任何点的高程,如果它们在具有相同y的高程点的旁边,但是y值大于下一个y的相同x的点起来

为该解释道歉,我不确定这是描述它的最佳方法。但是在这种情况下:点(-2,5)的仰角为-999999。在该y值处,它紧靠(-1,5)的高度为67,该高度大于(x,y + 1)=(-1,6)的高度为8的点。在这种情况下,我想要将(-2,5)的高度更改为(-2,6)= 9的高度。

此:

    index x    y       z
    0  -5.0  5.0 -999999
    1  -4.0  5.0 -999999
    2  -3.0  5.0 -999999
    3  -2.0  5.0 -999999
    4  -1.0  5.0      67
    5   0.0  5.0      55
    6   1.0  5.0      49
    7   2.0  5.0       7
    8   3.0  5.0       6
    9   4.0  5.0       6
    10 -5.0  6.0      12
    11 -4.0  6.0      12
    12 -3.0  6.0      19
    13 -2.0  6.0       9
    14 -1.0  6.0       8
    15  0.0  6.0       9
    16  1.0  6.0       9
    17  2.0  6.0       7
    18  3.0  6.0       7
    19  4.0  6.0       7

成为:

index x    y       z adjusted
0  -5.0  5.0 -999999        0    
1  -4.0  5.0 -999999        0
2  -3.0  5.0 -999999        0
3  -2.0  5.0       9        1
4  -1.0  5.0      67        0
5   0.0  5.0      55        0
6   1.0  5.0      49        0
7   2.0  5.0       7        0
8   3.0  5.0       6        0
9   4.0  5.0       6        0
10 -5.0  6.0      12        0
11 -4.0  6.0      12        0
12 -3.0  6.0      19        0
13 -2.0  6.0       9        0
14 -1.0  6.0       8        0
15  0.0  6.0       9        0
16  1.0  6.0       9        0
17  2.0  6.0       7        0
18  3.0  6.0       7        0
19  4.0  6.0       7        0

如何处理这样的数据帧?

2 个答案:

答案 0 :(得分:2)

基于熊猫的解决方案。如果我没有正确理解您的调整逻辑,这应该很容易调整。

df = pd.read_clipboard()

# filter table by relevant (negative) z locations
df_neg = df.loc[df.z < 0]

# get coordinates of relevant locations
list_x, list_y = df_neg.x, df_neg.y

# get lists of neighboring points relative to relevant locations
points_right = list(zip(list_x + 1, list_y))
points_topright = list(zip(list_x + 1, list_y + 1))
points_top = list(zip(list_x, list_y + 1))

# set x, y index for convenient access and initialize adjusted col
df_idxd = df.set_index(['x', 'y']).assign(adjusted=0)

# add values of neighboring points to the df_neg table
# if one of the points in the points_... lists doesn't exist,
# the values will be NaN and it won't bother us below
df_neg['right'] = df_idxd.loc[points_right].z.values
df_neg['topright'] = df_idxd.loc[points_topright].z.values
df_neg['top'] = df_idxd.loc[points_top].z.values

# get mask which determines whether or not we update
mask = (df_neg.right >= 0) & (df_neg.right > df_neg.topright)

# update values in df_neg
df_neg['z'] = df_neg.z.where(~mask, df_neg.top)
df_neg['adjusted'] = mask.astype(int)

# use df_neg to update the full table
df_idxd.update(df_neg.set_index(['x', 'y']))

# restore original index
df_idxd.reset_index().set_index('index')

结果:

         x    y         z  adjusted
index                              
0.0   -5.0  5.0 -999999.0       0.0
1.0   -4.0  5.0 -999999.0       0.0
2.0   -3.0  5.0 -999999.0       0.0
3.0   -2.0  5.0       9.0       1.0
4.0   -1.0  5.0      67.0       0.0
5.0    0.0  5.0      55.0       0.0
6.0    1.0  5.0      49.0       0.0
7.0    2.0  5.0       7.0       0.0
8.0    3.0  5.0       6.0       0.0
9.0    4.0  5.0       6.0       0.0
10.0  -5.0  6.0      12.0       0.0
11.0  -4.0  6.0      12.0       0.0
12.0  -3.0  6.0      19.0       0.0
13.0  -2.0  6.0       9.0       0.0
14.0  -1.0  6.0       8.0       0.0
15.0   0.0  6.0       9.0       0.0
16.0   1.0  6.0       9.0       0.0
17.0   2.0  6.0       7.0       0.0
18.0   3.0  6.0       7.0       0.0
19.0   4.0  6.0       7.0       0.0

答案 1 :(得分:2)

这是我设法放在一起的东西:

df = pd.DataFrame({'x': np.concatenate([np.arange(-5, 5), np.arange(-5, 5)]),
               'y': np.concatenate([np.repeat(5, 10), np.repeat(6, 10)]),
               'z': [-999999, -999999, -999999, -999999, 67, 55, 49, 7,
                     6, 6, 12, 12, 19, 9, 8, 9, 9, 7, 7, 7]})

第一步是将数据帧转换为2d numpy矩阵并对其进行处理,因为关系是在2d平面中发生的

vals = df.set_index(['x', 'y']).unstack().values

然后,计算要替换其值的掩码

mask_is_neg = vals < 0

mask_satisfies_ineq = np.pad(vals[1:, :-1] - vals[1:, 1:] > 0, ((0, 1), (0, 1)), mode='constant', constant_values=False)

mask = np.logical_and(mask_is_neg, mask_satisfies_ineq)

最后,将获得的遮罩在y方向上移动一以遮罩我们将用于替换的值

mask_grab = np.pad(mask[:, :-1], ((0, 0), (1, 0)), mode='constant', constant_values=False)

替换值:

vals[mask] = vals[mask_grab]

重塑数组并计算调整后的列:

vals = vals.flatten('F')
adjusted = (vals != df.z).astype(int)

最后,将这些值放在原始数据框中:

df.z = vals
df['adjusted'] = adjusted