我有2个数据帧(可以说df1
,df2
)
df1
具有(名称,姓氏,部门)
df2
具有(id,filename)
我想要的是->合并它们(说df3
)
df3
->(ID,文件名,名称,姓氏,部门)
共同点是文件名以工作者的名字结尾。
示例:
Filename : /company/workers/john
Name : john ( No duplicate name vals on df1,df2)
通常在合并中,我们使用公共列,但现在没有公共列,因此,如何使用这种匹配/相似性来组合这两个数据帧? 如果我不能使用这种相似性,该如何合并它们?
答案 0 :(得分:1)
您只需用/分割文件名列ID df2,然后获取最后一个组件
df2['name'] = df2['filename'].str.split('//').str[-1]
然后将df2中的名称列用作合并的键:)
答案 1 :(得分:1)
尝试这个:
pd.merge(df1, df2.apply(lambda x: pd.Series({"name": x.filename.split("/")[-1], "file_id": x.id, "filename": x.filename}), axis=1), on="name", how="left")
答案 2 :(得分:0)
Use str.rsplit(r"/",n=1,expand=True)[1].str.title(), where
rsplit: right split
n=1: max split
r"/": raw string, no escape seq.interpreted
expand: create new columns
title: steven --> Steven
Then merge them on "name".
In [25]: df1=pd.DataFrame( {"name":["John","Steven"], "surname":["Smith","Lee"], "departmen":["dep1","dep2"]})
In [26]: df2=pd.DataFrame({"id":[240,250], "filename":["/company/workers/steven", "/company/workers/john"]})
In [27]: df1
Out[27]:
name surname departmen
0 John Smith dep1
1 Steven Lee dep2
In [28]: df2
Out[28]:
id filename
0 240 /company/workers/steven
1 250 /company/workers/john
In [29]: df2["name"]= df2.filename.str.rsplit(r"/",n=1,expand=True)[1].str.title()
In [30]: df2
Out[30]:
id filename name
0 240 /company/workers/steven Steven
1 250 /company/workers/john John
In [31]: pd.merge(df2,df1, on="name")
Out[31]:
id filename name surname departmen
0 240 /company/workers/steven Steven Lee dep2
1 250 /company/workers/john John Smith dep1