根据匹配的索引合并两个数据框以更新数据框中的其他列

时间:2019-07-08 17:15:52

标签: python pandas dataframe

我有以下代码:

import pandas as pd

w = pd.Series(['BAIN', 'BAIN', 'BAIN', 'KPMG', 'KPMG', 'KPMG', 'EY', 'EY', 'EY' ])
x = pd.Series([101, 102, 103, 104, 105, 106, 107, 108, 109])
y = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9])
z = pd.Series(['', '', '', '', '', '', '', '', ''])
a = pd.Series(['', '', '', '', '', '', '', '', ''])
longdf = pd.DataFrame({'consultant': w, 'invoice_number':x, 'paid_value':y, 'manager':z, 'department':a})

w = pd.Series(['BAIN', 'KPMG', 'EY', 'EY' ])
x = pd.Series([101, 104, 107, 108])
y = pd.Series([13000, 14000, 15000, 16000])
z = pd.Series(['Dawn', 'Brody', 'Tang', 'Vos'])
a = pd.Series(['Data Science', 'Automation', 'Sourcing', 'Sourcing'])
shortdf = pd.DataFrame({'consultant': w, 'invoice_number':x, 'paid_value':y, 'lead_manager_name':z, 'department_num':a})

combo = longdf.merge(shortdf, on = ['consultant', 'invoice_number'], how = 'left')
indexer = ['consultant', 'invoice_number']

shortdf = shortdf.set_index(indexer).rename(columns={'lead_manager_name':'manager', 'department_num':'department'})
longdf = longdf.set_index(indexer)
new = longdf.update(shortdf, join = 'left')

目标是仅在索引与shortdf的索引匹配的longdf中更新。我附上了下面的目标。enter image description here

1 个答案:

答案 0 :(得分:1)

好吧,这里有pandas.merge。应该看起来像这样吗?

#Output
Consultant  invoice_number  paid_value  lead_manager_name   department_num

0   BAIN        101           13000.0          Dawn          Data Science
1   BAIN        102           13000.0          Dawn          Data Science
2   BAIN        103           13000.0          Dawn          Data Science
3   KPMG        104           14000.0          Brody         Automation
4   KPMG        105           14000.0          Brody         Automation
5   KPMG        106           14000.0          Brody         Automation
6   EY          107           15000.0          Tang          Sourcing
7   EY          108           16000.0          Vos           Sourcing
8   EY          109           16000.0          Vos           Sourcing

然后是代码:

cols_to_use= ["consultant","invoice_number"]
joint = pd.merge(longdf[cols_to_use],shortdf,on=cols_to_use,how="left").ffill()
joint