像这样合并df
的最佳方法是什么:
+------------+----------+
| domain | username |
+------------+----------+
| @gmail.com | gagaga |
+------------+----------+
| @mail.com | bobo |
+------------+----------+
带有这样的字典:
domain_to_app = {
'@gmail.com': ['gmail', 'youtube', 'gdrive'],
'@mail.com': ['email', 'dropbox']
}
得到这个:
+------------+----------+-----------+
| domain | username | app |
+------------+----------+-----------+
| @gmail.com | gagaga | gmail |
+------------+----------+-----------+
| @gmail.com | gagaga | youtube |
+------------+----------+-----------+
| @gmail.com | gagaga | gdrive |
+------------+----------+-----------+
| @live.com | bobo | email |
+------------+----------+-----------+
| @live.com | bobo | microsoft |
+------------+----------+-----------+
是否建议将具有重复行的dict
转换为df
并使用merge
,还是应该先使用map
然后使用unstack the app column?>
答案 0 :(得分:1)
您可以将map
用于新Series
,然后将chain.from_iterable
与repeat
用于新DataFrame
:
s = df['domain'].map(domain_to_app)
from itertools import chain
lens = s.str.len()
df = pd.DataFrame({
'domain' : df['domain'].values.repeat(lens),
'username' : df['username'].values.repeat(lens),
'app' : list(chain.from_iterable(s))
})
print (df)
domain username app
0 @gmail.com gagaga gmail
1 @gmail.com gagaga youtube
2 @gmail.com gagaga gdrive
3 @mail.com bobo email
4 @mail.com bobo dropbox
如果需要重复多列,请从DaatFrame
值创建mapped
,用stack
重塑形状,并用join
“重复”:
df['app'] = df['domain'].map(domain_to_app)
df = (df.join(pd.DataFrame(df.pop('app')
.values.tolist())
.stack()
.reset_index(level=1, drop=True)
.rename('app'))).reset_index(drop=True)
print (df)
domain username app
0 @gmail.com gagaga gmail
1 @gmail.com gagaga youtube
2 @gmail.com gagaga gdrive
3 @mail.com bobo email
4 @mail.com bobo dropbox
答案 1 :(得分:1)
尝试一下
df2= pd.DataFrame.from_dict(domain_to_app,orient='index').unstack().reset_index()
result= pd.merge(df1,df2[df2[0].notnull()],left_on=['domain'],right_on=['level_1'])
result=result[['domain','username',0]].rename(columns={0:'app'})
print result
输出:
domain username app
0 @gmail.com gagaga gmail
1 @gmail.com gagaga youtube
2 @gmail.com gagaga gdrive
3 @mail.com bobo email
4 @mail.com bobo dropbox
说明:
从字典中创建数据框,执行pd.merge
,然后根据需要清理数据框。