我想合并2个数据框:
df1:
cik0 cik1 cik2
'MKTG, INC.' 0001019056 None None
1 800 FLOWERS COM INC 0001104659 0001437749 None
11 GOOD ENERGY INC 0000930413 None None
1347 CAPITAL CORP 0001144204 None None
1347 PROPERTY INSURANCE HOLDINGS, INC. 0001387131 None None
df2:
cik Ticker
0 0001144204 AABB
1 0001019056 A
2 0001387131 AABC
3 0001437749 AA
4 0000930413 AAACU
预期结果:
cik0 cik1 cik2 ticker
'MKTG, INC.' 0001019056 None None A
1 800 FLOWERS COM INC 0001104659 0001437749 None AA
11 GOOD ENERGY INC 0000930413 None None AAACU
1347 CAPITAL CORP 0001144204 None None AABB
1347 PROPERTY INSURANCE HOLDINGS, INC. 0001387131 None None AABC
我想将cik0
与df2['cik']
进行匹配,
如果它不起作用,我想看看cik1
,依此类推。
感谢您的帮助!
答案 0 :(得分:4)
您可以将pd.Series.map
与fillna
一起使用几次:
ticker_map = df2.set_index('cik')['Ticker']
df1['ticker'] = df1['cik0'].map(ticker_map)\
.fillna(df1['cik1'].map(ticker_map))\
.fillna(df1['cik2'].map(ticker_map))
但是,这有点乏味。您可以定义一个函数来迭代执行此操作:
def apply_map_on_cols(df, cols, mapper):
s = df[cols[0]].map(mapper)
for col in cols[1:]:
s = s.fillna(df[col].map(mapper))
return s
df1['ticker'] = df.pipe(apply_map_on_cols,
cols=[f'cik{i}' for i in range(3)],
mapper=df2.set_index('cik')['Ticker'])
答案 1 :(得分:0)
另一种可能性是使用pd.merge
合并数据帧:
dfs = [] # list to temporarily store partially merged dfs
df1.reset_index(inplace=True) # reset index to maintain correct index order
for col in df1: # iterate over columns
# append partially merged columns
dfs.append(pd.merge(df1, df2, left_on=col, right_on=['cik']))
# concat all partial results:
df_result = pd.concat(dfs, axis=0)
df_result.set_index('index', inplace=True) # set old index
df_result.drop('cik', axis=1, inplace=True) # drop 'cik' row
当map
(df1.shape[0] >> df1.shape[1]
意味着更大)时,这应该比使用>>
的任何方法快几倍(对于大多数实际用例数据集而言,这应该是正确的。) >