从数据框列比较中返回多个记录

时间:2019-06-25 10:21:54

标签: python pandas

我有一个数据帧(df),看起来像:

    Column_A    Column_B    Column_C
0   01/11/2010  01/07/2016  10/07/2001
1   22/04/2014  02/04/2015  04/02/2015
2   08/01/2007  01/06/2015  06/01/2015
3   15/11/2017  04/01/2016  20/01/2014
4   09/10/2000  01/09/2015  09/01/2015
5   04/09/2006  25/03/2016  25/03/2016
6   21/09/2015  01/07/2016  21/09/2015
7   18/02/2003  12/02/2016  02/12/2016
8   15/07/2014  14/12/2015  16/07/2007
9   05/05/2014  01/10/2015  05/06/2014
10  26/11/2013  26/11/2013  26/11/2013
11  03/09/2009  26/03/2015  26/03/2015
12  12/05/2015  12/05/2015  05/12/2015
13  27/10/2018  02/04/2014  04/02/2014
14  15/02/2016  15/02/2016  15/02/2016

我试图返回其中Column_A> Column_BColumn_A> Column_C的记录。有关信息,最终有可能进行更多的日期字段比较。

因此在此示例中,我将返回:

    Column_A    Column_B    Column_C
0   01/11/2010  01/07/2016  10/07/2001
1   15/11/2017  04/01/2016  20/01/2014
2   15/07/2014  14/12/2015  16/07/2007
3   27/10/2018  02/04/2014  04/02/2014

为了获得此输出,我尝试过:

IncorrectOrder = df[df['Column_A']>df['Column_B] or df['Column_A']>df['Column_C']]  

但是我只返回df['Column_A']>df['Column_B]的记录...

请让我知道我做错了什么。

谢谢

1 个答案:

答案 0 :(得分:1)

添加()并将or更改为按位or-|

df = df.apply(pd.to_datetime, dayfirst=True)

IncorrectOrder = df[(df['Column_A']>df['Column_B']) | ( df['Column_A']>df['Column_C'])]  
print (IncorrectOrder)
     Column_A   Column_B   Column_C
0  2010-11-01 2016-07-01 2001-07-10
3  2017-11-15 2016-01-04 2014-01-20
8  2014-07-15 2015-12-14 2007-07-16
13 2018-10-27 2014-04-02 2014-02-04

如果可能的话,多列:

IncorrectOrder = df[(df[['Column_B', 'Column_C']].lt(df['Column_A'], axis=0)).any(axis=1)]
#all columns comapred with first
#IncorrectOrder = df[(df.iloc[:, 1:].lt(df['Column_A'], axis=0)).any(axis=1)]
print (IncorrectOrder)
     Column_A   Column_B   Column_C
0  2010-11-01 2016-07-01 2001-07-10
3  2017-11-15 2016-01-04 2014-01-20
8  2014-07-15 2015-12-14 2007-07-16
13 2018-10-27 2014-04-02 2014-02-04

详细信息:首先比较DataFrame.lt<的列:

print (df[['Column_B', 'Column_C']].lt(df['Column_A'], axis=0))
    Column_B  Column_C
0      False      True
1      False     False
2      False     False
3       True      True
4      False     False
5      False     False
6      False     False
7      False     False
8      False      True
9      False     False
10     False     False
11     False     False
12     False     False
13      True      True
14     False     False

然后通过DataFrame.any检查每行是否至少有一个True

print ((df[['Column_B', 'Column_C']].lt(df['Column_A'], axis=0)).any(axis=1))
0      True
1     False
2     False
3      True
4     False
5     False
6     False
7     False
8      True
9     False
10    False
11    False
12    False
13     True
14    False
dtype: bool