我有两个数据帧df1和df2。一个看起来像
Surname Knownas TB
0 K S 79.3
1 H E 79.1
2 I S 78.3
3 P B 78.2
4 W A 78.1
其他的看起来像
Mathematics Name
0 A* H,E
1 A* P,E
2 A* L,J
3 A* W,D
4 A C,K
我想加入这两个数据框但是存在问题。
我想使用Name作为df2的键,但是对于df1,我需要在两者之间用逗号连接字段Surname和Knownas以将其用作键。换句话说,来自df1的键将是" K,S" " T,J" " I,S"等等。
我已阅读并重读了该手册,但我无法了解如何执行此操作。
答案 0 :(得分:1)
我会将Name
列扩展为两列(Surname
和Knownas
)并使用两个DF中的Surname
和Knownas
列进行合并:
import six
import pandas as pd
data = """\
Surname Knownas TB
0 K S 79.3
1 T J 79.1
2 I S 78.3
3 P B 78.2
4 W A 78.1
"""
df1 = pd.read_csv(six.StringIO(data), sep='\s+', index_col=0)
print(df1)
data = """\
Mathematics Name
0 A* H,E
1 A* P,E
2 A* L,J
3 A* W,D
4 A C,K
5 A K,S
"""
df2 = pd.read_csv(six.StringIO(data), sep='\s+', index_col=0)
print(df2)
df2[['Surname', 'Knownas']] = df2.Name.str.split(',', expand=True)
print(df2)
merge = pd.merge(df1, df2, on=['Surname','Knownas'])
print(merge)
输出:
Surname Knownas TB
0 K S 79.3
1 T J 79.1
2 I S 78.3
3 P B 78.2
4 W A 78.1
Mathematics Name
0 A* H,E
1 A* P,E
2 A* L,J
3 A* W,D
4 A C,K
5 A K,S
Mathematics Name Surname Knownas
0 A* H,E H E
1 A* P,E P E
2 A* L,J L J
3 A* W,D W D
4 A C,K C K
5 A K,S K S
Surname Knownas TB Mathematics Name
0 K S 79.3 A K,S
或者,您可以在DF1中创建Name
列,并使用Name
列合并两个DF:
df1['Name'] = df1.Surname + ',' + df1.Knownas
merge = pd.merge(df1, df2, on=['Name'])
PS我故意将row5添加到第二个数据框,所以现在至少可以匹配一行