选择具有不同df条件的行

时间:2018-03-06 17:39:20

标签: python pandas

我有df1

           A           B   C
0   15:00:00  2002-01-13   8
1   15:00:00  2002-01-13   9
2   15:30:00  2002-01-13   8
3   15:30:00  2002-01-13   9
4   16:00:00  2002-01-13   8
5   16:00:00  2002-01-13   9
6   15:00:00  2002-01-14  17
7   15:00:00  2002-01-14  19
8   15:30:00  2002-01-14  17
9   15:30:00  2002-01-14  19
10  15:00:00  2002-01-15  38
11  15:00:00  2002-01-15  40
12  15:30:00  2002-01-15  38
13  15:30:00  2002-01-15  40
14  16:00:00  2002-01-15  38
15  16:00:00  2002-01-15  40

df2

           A           B   C
 0  16:00:00  2002-01-13   9
 1  16:00:00  2002-01-15  38

我想要一个新的df3选择df1中的行:

  • df1["B"] == df2["B"]df1["C"] == df2["C"]

df3应该是:

           A           B  C
1   15:00:00  2002-01-13  9  # df1["B"] == df2["B"] and df1["C"] == df2["C"]
3   15:30:00  2002-01-13  9
5   16:00:00  2002-01-13  9
10  15:00:00  2002-01-15  38
12  15:30:00  2002-01-15  38
14  16:00:00  2002-01-15  38

3 个答案:

答案 0 :(得分:2)

您可以使用pd.merge

df1.merge(df2, on=['B','C'],suffixes=('','_y')).drop('A_y',axis=1)

          A           B   C
0  15:00:00  2002-01-13   9
1  15:30:00  2002-01-13   9
2  16:00:00  2002-01-13   9
3  15:00:00  2002-01-15  38
4  15:30:00  2002-01-15  38
5  16:00:00  2002-01-15  38

另一个选择是使用布尔索引:

df1[df1.B.isin(df2.B) & df1.C.isin(df2.C)]

           A           B   C
1   15:00:00  2002-01-13   9
3   15:30:00  2002-01-13   9
5   16:00:00  2002-01-13   9
10  15:00:00  2002-01-15  38
12  15:30:00  2002-01-15  38
14  16:00:00  2002-01-15  38

答案 1 :(得分:1)

IIUC。我通常在R中执行此操作,似乎也适用于python

df1.loc[df1[['B','C']].astype(str).sum(1).isin(df2[['B','C']].astype(str).sum(1)),:]
Out[75]: 
           A           B   C
1   15:00:00  2002-01-13   9
3   15:30:00  2002-01-13   9
5   16:00:00  2002-01-13   9
10  15:00:00  2002-01-15  38
12  15:30:00  2002-01-15  38
14  16:00:00  2002-01-15  38

答案 2 :(得分:1)

我会这样做:

df3 = pd.merge(df1, df2, on=['B', 'C'])

其中包含以下内容:

        A_x           B   C       A_y
0  15:00:00  2002-01-13   9  16:00:00
1  15:30:00  2002-01-13   9  16:00:00
2  16:00:00  2002-01-13   9  16:00:00
3  15:00:00  2002-01-15  38  16:00:00
4  15:30:00  2002-01-15  38  16:00:00
5  16:00:00  2002-01-15  38  16:00:00

需要进行一些清理:

df3.drop('A_y', axis=1, inplace=True)
df3.columns = ['A', 'B', 'C']

结果:

          A           B   C
0  15:00:00  2002-01-13   9
1  15:30:00  2002-01-13   9
2  16:00:00  2002-01-13   9
3  15:00:00  2002-01-15  38
4  15:30:00  2002-01-15  38
5  16:00:00  2002-01-15  38