我有df1
:
A B C
0 15:00:00 2002-01-13 8
1 15:00:00 2002-01-13 9
2 15:30:00 2002-01-13 8
3 15:30:00 2002-01-13 9
4 16:00:00 2002-01-13 8
5 16:00:00 2002-01-13 9
6 15:00:00 2002-01-14 17
7 15:00:00 2002-01-14 19
8 15:30:00 2002-01-14 17
9 15:30:00 2002-01-14 19
10 15:00:00 2002-01-15 38
11 15:00:00 2002-01-15 40
12 15:30:00 2002-01-15 38
13 15:30:00 2002-01-15 40
14 16:00:00 2002-01-15 38
15 16:00:00 2002-01-15 40
df2
:
A B C
0 16:00:00 2002-01-13 9
1 16:00:00 2002-01-15 38
我想要一个新的df3
选择df1
中的行:
df1["B"] == df2["B"]
和df1["C"] == df2["C"]
df3
应该是:
A B C
1 15:00:00 2002-01-13 9 # df1["B"] == df2["B"] and df1["C"] == df2["C"]
3 15:30:00 2002-01-13 9
5 16:00:00 2002-01-13 9
10 15:00:00 2002-01-15 38
12 15:30:00 2002-01-15 38
14 16:00:00 2002-01-15 38
答案 0 :(得分:2)
您可以使用pd.merge
:
df1.merge(df2, on=['B','C'],suffixes=('','_y')).drop('A_y',axis=1)
A B C
0 15:00:00 2002-01-13 9
1 15:30:00 2002-01-13 9
2 16:00:00 2002-01-13 9
3 15:00:00 2002-01-15 38
4 15:30:00 2002-01-15 38
5 16:00:00 2002-01-15 38
另一个选择是使用布尔索引:
df1[df1.B.isin(df2.B) & df1.C.isin(df2.C)]
A B C
1 15:00:00 2002-01-13 9
3 15:30:00 2002-01-13 9
5 16:00:00 2002-01-13 9
10 15:00:00 2002-01-15 38
12 15:30:00 2002-01-15 38
14 16:00:00 2002-01-15 38
答案 1 :(得分:1)
IIUC。我通常在R中执行此操作,似乎也适用于python
df1.loc[df1[['B','C']].astype(str).sum(1).isin(df2[['B','C']].astype(str).sum(1)),:]
Out[75]:
A B C
1 15:00:00 2002-01-13 9
3 15:30:00 2002-01-13 9
5 16:00:00 2002-01-13 9
10 15:00:00 2002-01-15 38
12 15:30:00 2002-01-15 38
14 16:00:00 2002-01-15 38
答案 2 :(得分:1)
我会这样做:
df3 = pd.merge(df1, df2, on=['B', 'C'])
其中包含以下内容:
A_x B C A_y
0 15:00:00 2002-01-13 9 16:00:00
1 15:30:00 2002-01-13 9 16:00:00
2 16:00:00 2002-01-13 9 16:00:00
3 15:00:00 2002-01-15 38 16:00:00
4 15:30:00 2002-01-15 38 16:00:00
5 16:00:00 2002-01-15 38 16:00:00
需要进行一些清理:
df3.drop('A_y', axis=1, inplace=True)
df3.columns = ['A', 'B', 'C']
结果:
A B C
0 15:00:00 2002-01-13 9
1 15:30:00 2002-01-13 9
2 16:00:00 2002-01-13 9
3 15:00:00 2002-01-15 38
4 15:30:00 2002-01-15 38
5 16:00:00 2002-01-15 38