我正在使用Indicator=True
在两个数据帧之间进行合并,以检查右侧和左侧的值。
运行良好。
现在,我需要添加一列名为id
的列,并且该列必须是数字值-每行应具有唯一值。如果indicator
在left-only
边显示,我需要在id
列上获取最大值,并为仅出现在左侧的每一行加1。
import pandas as pd
data_right = [{"id": 11, "name": "johnny", "department": "a"}]
data_left = [{"name": "robert", "department": "b"}, {"name": "climber", "department": "b"}]
df_right = pd.DataFrame.from_dict(data_right)
df_left = pd.DataFrame.from_dict(data_left)
df_merged = df_left.merge(df_right, on=["name", "department"], how="outer", indicator=True)
# df["id"] = ??
print(df_merged)
# how to get df["id"] = NaN and increment by 1 based on max value?
在上述代码中,id
中的robert
应该是12
,而climber
应该是13
。
答案 0 :(得分:2)
您要寻找cumsum
和fillna
吗?
df_merged['id'] = df_merged['id'].fillna(
df_merged['id'].max() + (df_merged['_merge'] == 'left_only').cumsum())
df_merged
department name id _merge
0 b robert 12.0 left_only
1 b climber 13.0 left_only
2 a johnny 11.0 right_only
答案 1 :(得分:0)
使用自定义函数和apply
的解决方案。
start = df_merged['id'].max()
def setid(x):
global start
if np.isnan(x):
start += 1
return start
else:
return x
df_merged['id'] = df_merged['id'].apply(setid)
这里,当df_merged['id']
为NaN
时,您只需增加,而无需检查_merge
列,因此在合并时不需要indicator=True
参数。