我有两个数据框:
import pandas as pd
data = [['138249','Cat']
,['103669','Cat']
,['191826','Cat']
,['196655','Cat']
,['103669','Cat']
,['116780','Dog']
,['184831','Dog']
,['196655','Dog']
,['114333','Dog']
,['123757','Dog']]
df1 = pd.DataFrame(data, columns = ['Hash','Name'])
print(df1)
data2 = [
'138249',
'103669',
'191826',
'196655',
'116780',
'184831',
'114333',
'123757',]
df2 = pd.DataFrame(data2, columns = ['Hash'])
我想编写一个代码,将代码放入第二个数据帧中,扫描第一个数据帧中最左边的值,然后将所有匹配的值从第一个数据帧返回到第二个数据帧中的单个单元格中。
这是我想要的结果:
这是我尝试过的:
#attempt one: use groupby to squish up the dataset. No results
past = df1.groupby('Hash')
print(past)
#attempt two: use merge. Result: empty dataframe
past1 = pd.merge(df1, df2, right_index=True, left_on='Hash')
print(past1)
#attempt three: use pivot. Result: not the right format.
past2 = df1.pivot(index = None, columns = 'Hash', values = 'Name')
print(past2)
我可以使用VBA代码here在Excel中执行此操作,但是当我将其应用于我的真实数据集时,此代码会崩溃(可能是因为它太大-大约30,000行长)
答案 0 :(得分:3)
IIUC首先使用agg
和join
df1
和reindex
,然后使用df2
df1.groupby('Hash')['Name'].agg(','.join).reindex(df2.Hash).reset_index()
Hash Name
0 138249 Cat
1 103669 Cat,Cat
2 191826 Cat
3 196655 Cat,Dog
4 116780 Dog
5 184831 Dog
6 114333 Dog
7 123757 Dog