熊猫-如果为空,则添加增量值

时间:2019-06-25 23:00:00

标签: python pandas

我正在使用Indicator=True在两个数据帧之间进行合并,以检查右侧和左侧的值。

运行良好。

现在,我需要添加一列名为id的列,并且该列必须是数字值-每行应具有唯一值。如果indicatorleft-only边显示,我需要在id列上获取最大值,并为仅出现在左侧的每一行加1。

import pandas as pd

data_right = [{"id": 11, "name": "johnny", "department": "a"}]
data_left = [{"name": "robert", "department": "b"}, {"name": "climber", "department": "b"}]
df_right = pd.DataFrame.from_dict(data_right)
df_left = pd.DataFrame.from_dict(data_left)
df_merged = df_left.merge(df_right, on=["name", "department"], how="outer", indicator=True)
# df["id"] = ??
print(df_merged)
# how to get df["id"] = NaN and increment by 1 based on max value?

在上述代码中,id中的robert应该是12,而climber应该是13

2 个答案:

答案 0 :(得分:2)

您要寻找cumsumfillna吗?

df_merged['id'] = df_merged['id'].fillna(
    df_merged['id'].max() + (df_merged['_merge'] == 'left_only').cumsum())

df_merged
  department     name    id      _merge
0          b   robert  12.0   left_only
1          b  climber  13.0   left_only
2          a   johnny  11.0  right_only

答案 1 :(得分:0)

使用自定义函数和apply的解决方案。

start = df_merged['id'].max()
def setid(x):
    global start
    if np.isnan(x):
        start += 1
        return start
    else:
        return x

df_merged['id'] = df_merged['id'].apply(setid)

这里,当df_merged['id']NaN时,您只需增加,而无需检查_merge列,因此在合并时不需要indicator=True参数。