对于两个数据帧中都存在的项目ID,我试图用hours
中的df
替换hours
中的replacements
:
import pandas as pd
df = pd.DataFrame({
'project_ids': [1, 2, 3, 4, 5],
'hours': [111, 222, 333, 444, 555],
'else' :['a', 'b', 'c', 'd', 'e']
})
replacements = pd.DataFrame({
'project_ids': [2, 5, 3],
'hours': [666, 999, 1000],
})
for project in replacements['project_ids']:
df.loc[df['project_ids'] == project, 'hours'] = replacements.loc[replacements['project_ids'] == project, 'hours']
print(df)
但是,只有项目ID 3获得了正确的分配(1000),但是项目2和5都获得了NaN
:
projects hours else
0 1 111.0 a
1 2 NaN b
2 3 1000.0 c
3 4 444.0 d
4 5 NaN e
答案 0 :(得分:1)
将Series.map
与Series
与DataFrame.set_index
创建的另一个replacements
一起使用:
s = replacements.set_index('project_ids')['hours']
df['hours'] = df['project_ids'].map(s).fillna(df['hours'])
print(df)
project_ids hours else
0 1 111.0 a
1 2 666.0 b
2 3 1000.0 c
3 4 444.0 d
4 5 999.0 e
答案 1 :(得分:1)
使用df.update()
的另一种方法:
m=df.set_index('project_ids')
m.update(replacements.set_index('project_ids')['hours'])
print(m.reset_index())
project_ids hours else
0 1 111.0 a
1 2 666.0 b
2 3 1000.0 c
3 4 444.0 d
4 5 999.0 e
答案 2 :(得分:0)
另一种解决方案是先使用pandas.merge
,然后再使用fillna
:
df_new = pd.merge(df, replacements, on='project_ids', how='left', suffixes=['_1', ''])
df_new['hours'].fillna(df_new['hours_1'], inplace=True)
df_new.drop('hours_1', axis=1, inplace=True)
print(df_new)
project_ids else hours
0 1 a 111.0
1 2 b 666.0
2 3 c 1000.0
3 4 d 444.0
4 5 e 999.0