在特定列中找到通用值时,用另一个数据框中的值替换一个数据框中的值

时间:2019-04-29 10:27:29

标签: python pandas

对于两个数据帧中都存在的项目ID,我试图用hours中的df替换hours中的replacements

import pandas as pd

df = pd.DataFrame({
    'project_ids': [1, 2, 3, 4, 5],
    'hours': [111, 222, 333, 444, 555],
    'else' :['a', 'b', 'c', 'd', 'e']
})

replacements = pd.DataFrame({
    'project_ids': [2, 5, 3],
    'hours': [666, 999, 1000],
})

for project in replacements['project_ids']:
    df.loc[df['project_ids'] == project, 'hours'] = replacements.loc[replacements['project_ids'] == project, 'hours']

print(df)

但是,只有项目ID 3获得了正确的分配(1000),但是项目2和5都获得了NaN

 projects   hours else
0         1   111.0    a
1         2     NaN    b
2         3  1000.0    c
3         4   444.0    d
4         5     NaN    e
  1. 我该如何解决?
  2. 有更好的方法吗?

3 个答案:

答案 0 :(得分:1)

Series.mapSeriesDataFrame.set_index创建的另一个replacements一起使用:

s = replacements.set_index('project_ids')['hours']
df['hours'] = df['project_ids'].map(s).fillna(df['hours'])
print(df)
   project_ids   hours else
0            1   111.0    a
1            2   666.0    b
2            3  1000.0    c
3            4   444.0    d
4            5   999.0    e

答案 1 :(得分:1)

使用df.update()的另一种方法:

m=df.set_index('project_ids')
m.update(replacements.set_index('project_ids')['hours'])
print(m.reset_index())

   project_ids   hours else
0            1   111.0    a
1            2   666.0    b
2            3  1000.0    c
3            4   444.0    d
4            5   999.0    e

答案 2 :(得分:0)

另一种解决方案是先使用pandas.merge,然后再使用fillna

df_new = pd.merge(df, replacements, on='project_ids', how='left', suffixes=['_1', ''])
df_new['hours'].fillna(df_new['hours_1'], inplace=True)
df_new.drop('hours_1', axis=1, inplace=True)

print(df_new)
   project_ids else   hours
0            1    a   111.0
1            2    b   666.0
2            3    c  1000.0
3            4    d   444.0
4            5    e   999.0