Question

我有一个非常大的数据框，类似于此：

     CustomerId   Latitude   Longitude     
0.        a        x1         y1
1.        a        x2         y2
2.        b        x3         y3
3.        c        x4         y4

还有第二个数据框，它对应第一个数据框的样本，像这样：

     CustomerId   Latitude   Longitude     
0.        a         x1         y1
3.        c         x4         y4

我的目标是获得一个新的数据框，就像原始数据框一样，但使用NaN代替具有第二个数据框上不存在的索引的行的坐标。这是我需要的结果：

     CustomerId   Latitude   Longitude     
0.        a        x1         y1
1.        a        NaN        NaN
2.        b        NaN        NaN
3.        c        x4         y4

我是Python的新手，我还没有发现任何类似的问题。有人知道如何解决吗？

Answer 1

首先，我们使用pandas.DataFrame.isin

创建一个遮罩

此后，我们使用np.where，并用public static int[] Combine(int a[], int b[]) { int[] c = Arrays.copyOf(a, a.length + b.length); System.arraycopy(b, 0, c, a.length, b.length); return c; }要求相反。

说明：
mask = df.CustomerId.isin(df2.CustomerId) df['Latitude'] = np.where(~mask, np.NaN, df['Latitude']) df['Longitude'] = np.where(~mask, np.NaN, df['Longitude']) print(df) CustomerId Latitude Longitude 0.0 a x1 y1 1.0 a x2 y2 2.0 b NaN NaN 3.0 c x4 y4的工作方式如下：np.where

如果行的索引在另一个数据框中不存在，则用NaN替换一个数据框中的某些值

1 个答案: