我有以下数据帧显示纬度(y),经度(x)和高程(z)。负数表示没有海拔,即没有土地。
我要更改当前没有高程的任何点的高程,如果它们在具有相同y的高程点的旁边,但是y值大于下一个y的相同x的点起来
为该解释道歉,我不确定这是描述它的最佳方法。但是在这种情况下:点(-2,5)的仰角为-999999。在该y值处,它紧靠(-1,5)的高度为67,该高度大于(x,y + 1)=(-1,6)的高度为8的点。在这种情况下,我想要将(-2,5)的高度更改为(-2,6)= 9的高度。
此:
index x y z
0 -5.0 5.0 -999999
1 -4.0 5.0 -999999
2 -3.0 5.0 -999999
3 -2.0 5.0 -999999
4 -1.0 5.0 67
5 0.0 5.0 55
6 1.0 5.0 49
7 2.0 5.0 7
8 3.0 5.0 6
9 4.0 5.0 6
10 -5.0 6.0 12
11 -4.0 6.0 12
12 -3.0 6.0 19
13 -2.0 6.0 9
14 -1.0 6.0 8
15 0.0 6.0 9
16 1.0 6.0 9
17 2.0 6.0 7
18 3.0 6.0 7
19 4.0 6.0 7
成为:
index x y z adjusted
0 -5.0 5.0 -999999 0
1 -4.0 5.0 -999999 0
2 -3.0 5.0 -999999 0
3 -2.0 5.0 9 1
4 -1.0 5.0 67 0
5 0.0 5.0 55 0
6 1.0 5.0 49 0
7 2.0 5.0 7 0
8 3.0 5.0 6 0
9 4.0 5.0 6 0
10 -5.0 6.0 12 0
11 -4.0 6.0 12 0
12 -3.0 6.0 19 0
13 -2.0 6.0 9 0
14 -1.0 6.0 8 0
15 0.0 6.0 9 0
16 1.0 6.0 9 0
17 2.0 6.0 7 0
18 3.0 6.0 7 0
19 4.0 6.0 7 0
如何处理这样的数据帧?
答案 0 :(得分:2)
基于熊猫的解决方案。如果我没有正确理解您的调整逻辑,这应该很容易调整。
df = pd.read_clipboard()
# filter table by relevant (negative) z locations
df_neg = df.loc[df.z < 0]
# get coordinates of relevant locations
list_x, list_y = df_neg.x, df_neg.y
# get lists of neighboring points relative to relevant locations
points_right = list(zip(list_x + 1, list_y))
points_topright = list(zip(list_x + 1, list_y + 1))
points_top = list(zip(list_x, list_y + 1))
# set x, y index for convenient access and initialize adjusted col
df_idxd = df.set_index(['x', 'y']).assign(adjusted=0)
# add values of neighboring points to the df_neg table
# if one of the points in the points_... lists doesn't exist,
# the values will be NaN and it won't bother us below
df_neg['right'] = df_idxd.loc[points_right].z.values
df_neg['topright'] = df_idxd.loc[points_topright].z.values
df_neg['top'] = df_idxd.loc[points_top].z.values
# get mask which determines whether or not we update
mask = (df_neg.right >= 0) & (df_neg.right > df_neg.topright)
# update values in df_neg
df_neg['z'] = df_neg.z.where(~mask, df_neg.top)
df_neg['adjusted'] = mask.astype(int)
# use df_neg to update the full table
df_idxd.update(df_neg.set_index(['x', 'y']))
# restore original index
df_idxd.reset_index().set_index('index')
结果:
x y z adjusted
index
0.0 -5.0 5.0 -999999.0 0.0
1.0 -4.0 5.0 -999999.0 0.0
2.0 -3.0 5.0 -999999.0 0.0
3.0 -2.0 5.0 9.0 1.0
4.0 -1.0 5.0 67.0 0.0
5.0 0.0 5.0 55.0 0.0
6.0 1.0 5.0 49.0 0.0
7.0 2.0 5.0 7.0 0.0
8.0 3.0 5.0 6.0 0.0
9.0 4.0 5.0 6.0 0.0
10.0 -5.0 6.0 12.0 0.0
11.0 -4.0 6.0 12.0 0.0
12.0 -3.0 6.0 19.0 0.0
13.0 -2.0 6.0 9.0 0.0
14.0 -1.0 6.0 8.0 0.0
15.0 0.0 6.0 9.0 0.0
16.0 1.0 6.0 9.0 0.0
17.0 2.0 6.0 7.0 0.0
18.0 3.0 6.0 7.0 0.0
19.0 4.0 6.0 7.0 0.0
答案 1 :(得分:2)
这是我设法放在一起的东西:
df = pd.DataFrame({'x': np.concatenate([np.arange(-5, 5), np.arange(-5, 5)]),
'y': np.concatenate([np.repeat(5, 10), np.repeat(6, 10)]),
'z': [-999999, -999999, -999999, -999999, 67, 55, 49, 7,
6, 6, 12, 12, 19, 9, 8, 9, 9, 7, 7, 7]})
第一步是将数据帧转换为2d numpy矩阵并对其进行处理,因为关系是在2d平面中发生的
vals = df.set_index(['x', 'y']).unstack().values
然后,计算要替换其值的掩码
mask_is_neg = vals < 0
mask_satisfies_ineq = np.pad(vals[1:, :-1] - vals[1:, 1:] > 0, ((0, 1), (0, 1)), mode='constant', constant_values=False)
mask = np.logical_and(mask_is_neg, mask_satisfies_ineq)
最后,将获得的遮罩在y方向上移动一以遮罩我们将用于替换的值
mask_grab = np.pad(mask[:, :-1], ((0, 0), (1, 0)), mode='constant', constant_values=False)
替换值:
vals[mask] = vals[mask_grab]
重塑数组并计算调整后的列:
vals = vals.flatten('F')
adjusted = (vals != df.z).astype(int)
最后,将这些值放在原始数据框中:
df.z = vals
df['adjusted'] = adjusted