我有以下代码:
import pandas as pd
w = pd.Series(['BAIN', 'BAIN', 'BAIN', 'KPMG', 'KPMG', 'KPMG', 'EY', 'EY', 'EY' ])
x = pd.Series([101, 102, 103, 104, 105, 106, 107, 108, 109])
y = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9])
z = pd.Series(['', '', '', '', '', '', '', '', ''])
a = pd.Series(['', '', '', '', '', '', '', '', ''])
longdf = pd.DataFrame({'consultant': w, 'invoice_number':x, 'paid_value':y, 'manager':z, 'department':a})
w = pd.Series(['BAIN', 'KPMG', 'EY', 'EY' ])
x = pd.Series([101, 104, 107, 108])
y = pd.Series([13000, 14000, 15000, 16000])
z = pd.Series(['Dawn', 'Brody', 'Tang', 'Vos'])
a = pd.Series(['Data Science', 'Automation', 'Sourcing', 'Sourcing'])
shortdf = pd.DataFrame({'consultant': w, 'invoice_number':x, 'paid_value':y, 'lead_manager_name':z, 'department_num':a})
combo = longdf.merge(shortdf, on = ['consultant', 'invoice_number'], how = 'left')
indexer = ['consultant', 'invoice_number']
shortdf = shortdf.set_index(indexer).rename(columns={'lead_manager_name':'manager', 'department_num':'department'})
longdf = longdf.set_index(indexer)
new = longdf.update(shortdf, join = 'left')
答案 0 :(得分:1)
好吧,这里有pandas.merge。应该看起来像这样吗?
#Output
Consultant invoice_number paid_value lead_manager_name department_num
0 BAIN 101 13000.0 Dawn Data Science
1 BAIN 102 13000.0 Dawn Data Science
2 BAIN 103 13000.0 Dawn Data Science
3 KPMG 104 14000.0 Brody Automation
4 KPMG 105 14000.0 Brody Automation
5 KPMG 106 14000.0 Brody Automation
6 EY 107 15000.0 Tang Sourcing
7 EY 108 16000.0 Vos Sourcing
8 EY 109 16000.0 Vos Sourcing
然后是代码:
cols_to_use= ["consultant","invoice_number"]
joint = pd.merge(longdf[cols_to_use],shortdf,on=cols_to_use,how="left").ffill()
joint