Question

我有两个数据框 df1 和 df2。

d = d = {'ID': [31,42,63,44,45,26], 
     'lat': [64,64,64,64,64,64],
     'lon': [152,152,152,152,152,152],
     'other1': [12,13,14,15,16,17],
     'other2': [21,22,23,24,25,26]}
df1 = pd.DataFrame(data=d)

d2 ={'ID': [27,48,31,45,49,10], 
     'LAT': [63,63,63,63,63,63],
     'LON': [153,153,153,153,153,153]}
df2 = pd.DataFrame(data=d2)

df1 列 lat 和 lon 的值不正确，但我需要跟踪的其他列中的数据正确。 df2 具有正确的 LAT 和 LON 值，但只有少数与 df1 相同的 ID。我想完成两件事。首先，我想将 df1 拆分为两个数据帧：df3 其 ID 存在于 df2 中；和 df4 拥有其他一切。我可以通过以下方式获得 df3：

df3=pd.DataFrame()
for i in reduce(np.intersect1d, [df1.ID, df2.ID]):
    df3=df3.append(df1.loc[df1.ID==i])

但是如何让 df4 成为剩余数据？

其次，我想用来自lat 的正确数据替换 lon 中的 df3 和 df2 值。我认为有一种巧妙的 Python 方法可以执行以下操作：

for j in range(len(df3)):
    for k in range(len(df2)):
        if df3.ID[j] == df2.ID[k]:
            df3.lat[j] = df2.LAT[k]
            df3.lon[j] = df2.LON[k]

但我什至无法让上面的嵌套循环正常工作。如果在 python 中有更好的方法来实现这一点，我不想花很多时间让它工作。

Answer 1

对于问题 1，您可以使用布尔索引：

m = df1.ID.isin(df2.ID)

df3 = df1[m]
df4 = df1[~m]

print(df3)
print(df4)

打印：

   ID  lat  lon  other1  other2
0  31   64  152      12      21
4  45   64  152      16      25

   ID  lat  lon  other1  other2
1  42   64  152      13      22
2  63   64  152      14      23
3  44   64  152      15      24
5  26   64  152      17      26

对于问题 2：

x = df3.merge(df2, on="ID")[["ID", "other1", "other2", "LAT", "LON"]]
print(x)

打印：

   ID  other1  other2  LAT  LON
0  31      12      21   63  153
1  45      16      25   63  153

编辑：对于问题 2，您可以这样做：

x = df3.merge(df2, on="ID").drop(columns=["lat", "lon"])
print(x)

Answer 2

您可以与指标 True 合并，然后保留对 LAT 和 LON 的偏好，并用 lat 和 lon 填充其余部分，然后使用指标和石斑鱼和创建字典。然后抓取字典的键：

u = df1.merge(df2,on='ID',how='left',indicator='I')
u[['LAT','LON']] = np.where(u[['LAT','LON']].isna(),u[['lat','lon']],u[['LAT','LON']])
u = u.drop(['lat','lon'],1)
u['I'] = np.where(u['I'].eq("left_only"),"left_df","others")
d = dict(iter(u.groupby("I")))

print(d['left_df'],'\n--------\n',d['others'])

   ID  other1  other2   LAT    LON        I
1  42      13      22  64.0  152.0  left_df
2  63      14      23  64.0  152.0  left_df
3  44      15      24  64.0  152.0  left_df
5  26      17      26  64.0  152.0  left_df 
--------
    ID  other1  other2   LAT    LON       I
0  31      12      21  63.0  153.0  others
4  45      16      25  63.0  153.0  others

python：用来自特定索引的另一个数据帧的值替换数据帧中的值

2 个答案: