我要基于列表字典中描述的关系加入两个数据框,其中字典中的键引用dfA idA列中的id,列表中的项是dfB idB列中的id。数据框和字典如下所示:
dfA
colA colB idA
0 a abc 3
1 b def 4
2 b ghi 5
dfB
colX idB colZ
0 bob 7 a
1 bob 7 b
2 bob 7 c
3 jim 8 d
4 jake 9 a
5 jake 9 e
myDict = { '3': [ '7', '8' ], '4': [], '5': ['7', '9'] }
如何使用myDict连接两个数据框以生成如下所示的数据框?
dfC
colA colB idA colX idB colZ
0 a abc 3 bob 7 a
1 b
2 c
3 jim 8 d
4 b def 4 None None None
5 b ghi 5 bob 7 a
6 b
7 c
8 jake 9 a
9 e
答案 0 :(得分:1)
您可以从字典中创建链接表(DataFrame)。下面是完整的工作示例。最后可能需要对行和列进行一些排序才能准确生成您的输出。
import pandas as pd
import numpy as np
dfA = pd.DataFrame({'colA': ('a', 'b', 'b'),
'colB': ('abc', 'def', 'ghi'),
'idA': ('3', '4', '5')})
dfB = pd.DataFrame({'colX': ('bob', 'bob', 'bob', 'jim', 'jake', 'jake'),
'idB': ('7', '7', '7', '8', '9', '9'),
'colZ': ('a', 'b', 'c', 'd', 'a', 'e')})
myDict = {'3': ['7', '8'], '4': [], '5': ['7', '9']}
dfC = pd.DataFrame(columns=['idA', 'idB'])
i = 0
for key, value in myDict.items():
# the if statement is for empty list to create one record with NaNs
if not value:
dfC.loc[i, 'idA'] = key
dfC.loc[i, 'idB'] = np.nan
i += 1
for val in value:
dfC.loc[i, 'idA'] = key
dfC.loc[i, 'idB'] = val
i += 1
temp = dfA.merge(dfC, how='right')
result = temp.merge(dfB, how='outer')
print(result)
输出为:
colA colB idA idB colX colZ
0 a abc 3 7 bob a
1 a abc 3 7 bob b
2 a abc 3 7 bob c
3 b ghi 5 7 bob a
4 b ghi 5 7 bob b
5 b ghi 5 7 bob c
6 a abc 3 8 jim d
7 b def 4 NaN NaN NaN
8 b ghi 5 9 jake a
9 b ghi 5 9 jake e
答案 1 :(得分:0)
这不是最好的解决方案,但它相当简单,可以完成工作
temp = pd.DataFrame(dfA.idAaux.tolist(), index = dfA.idA).stack()
temp = temp.reset_index()[['idA', 0]]
temp.columns = ['idA', 'idB']
temp2 = dfA.merge(temp, left_on='idA', right_on='idA', how='left').drop('idAaux', axis=1)
temp2['idB'] = pd.to_numeric(temp2['idB'])
res= temp2.merge(dfB, left_on='idB', right_on='idB', how='left')
输出:
colA colB idA idB colX colZ
0 a abc 3 7.0 bob a
1 a abc 3 7.0 bob b
2 a abc 3 7.0 bob c
3 a abc 3 8.0 jim d
4 b def 4 NaN NaN NaN
5 b ghi 5 7.0 bob a
6 b ghi 5 7.0 bob b
7 b ghi 5 7.0 bob c
8 b ghi 5 9.0 jake a
9 b ghi 5 9.0 jake e