df1=pd.DataFrame({'Product_ID':["55165","80125,30215","55557","92361","32619,28965,20147","88722","82793","70809, 20201","11367"],
'Product': ["ABC",'FDA','FSD','JGH','TYE','BVC','LKJ','HJD','POI'],
'Country':['CN','US','GB','AG','MX','CA','DE','CA','SG']})
df2=pd.DataFrame({'Deal_ID':[70809,88722,82793,20201,55165,30215,11367]})
我想返回国家信息和添加到df2的product_id。 我尝试使用连接功能,但df1的Product_ID不是数字。有什么解决办法吗?
预先感谢您的帮助。
答案 0 :(得分:4)
您可以通过几个步骤完成此操作:-
适当地束缚并重复序列,记住要从str
转换为int
:
from itertools import chain
import numpy as np
split = df1['Product_ID'].str.split(',')
lens = split.map(len)
df1 = pd.DataFrame({'Country': np.repeat(df1['Country'], lens),
'Product': np.repeat(df1['Product'], lens),
'Deal_ID': list(map(int, chain.from_iterable(split)))})
df2 = df2.merge(df1)
print(df2)
Deal_ID Country Product
0 70809 CA HJD
1 88722 CA BVC
2 82793 DE LKJ
3 20201 CA HJD
4 55165 CN ABC
5 30215 US FDA
6 11367 SG POI