我拖了三个有重复行的DataFrame。
In [31]: df1
Out[31]:
member time
0 0 2009-09-30 12:00:00
1 0 2009-09-30 18:00:00
2 0 2009-10-01 00:00:00
3 1 2009-09-30 12:00:00
4 1 2009-09-30 18:00:00
5 2 2009-09-30 12:00:00
6 3 2009-09-30 12:00:00
...
In [32]: df2
Out[32]:
member time
0 0 2009-09-30 12:00:00
1 0 2009-09-30 18:00:00
3 1 2009-09-30 12:00:00
4 2 2009-09-30 12:00:00
5 2 2009-09-30 18:00:00
6 2 2009-10-01 00:00:00
...
我想从df1和df2过滤出具有'member'和'time'唯一值的行,并获得一个只有行具有'member'和'time'公共值的行的DataFrame在df1和df2中,即
In [33]: df_duplicated_1_and_2
Out[33]:
member time
0 0 2009-09-30 12:00:00
1 0 2009-09-30 18:00:00
3 1 2009-09-30 12:00:00
4 2 2009-09-30 12:00:00
...
有没有一种高效优雅的方法来做到这一点?
更新如果可能,我想要的不是新的合并DataFrame,而是过滤后的DataFrame。例如,
In [34]: df1
Out[34]:
member time value
0 0 2009-09-30 12:00:00 a
1 0 2009-09-30 18:00:00 b
2 0 2009-10-01 00:00:00 c
3 1 2009-09-30 12:00:00 d
4 1 2009-09-30 18:00:00 e
5 2 2009-09-30 12:00:00 f
6 3 2009-09-30 12:00:00 g
...
In [35]: df1_filtered_out
Out[35]:
member time value
0 0 2009-09-30 12:00:00 a
1 0 2009-09-30 18:00:00 b
3 1 2009-09-30 12:00:00 d
4 2 2009-09-30 12:00:00 g
...
并且也过滤了df2。
答案 0 :(得分:4)
在member
和time
列上进行内部联接:
>>> df1.merge(df2, on=['member', 'time'], how='inner')
member time
0 0 2009-09-30 12:00:00
1 0 2009-09-30 18:00:00
2 1 2009-09-30 12:00:00
3 2 2009-09-30 12:00:00
这将生成一个结果,该结果只包含两个DataFrame中具有相同member
和time
值的行。
<强>更新强>
>>> df1.merge(df2[['member', 'time']])
member time value
0 0 2009-09-30 12:00:00 a
1 0 2009-09-30 18:00:00 b
2 1 2009-09-30 12:00:00 d
3 2 2009-09-30 12:00:00 f