我有一个数据帧(df
),看起来像:
Column_A Column_B Column_C
0 01/11/2010 01/07/2016 10/07/2001
1 22/04/2014 02/04/2015 04/02/2015
2 08/01/2007 01/06/2015 06/01/2015
3 15/11/2017 04/01/2016 20/01/2014
4 09/10/2000 01/09/2015 09/01/2015
5 04/09/2006 25/03/2016 25/03/2016
6 21/09/2015 01/07/2016 21/09/2015
7 18/02/2003 12/02/2016 02/12/2016
8 15/07/2014 14/12/2015 16/07/2007
9 05/05/2014 01/10/2015 05/06/2014
10 26/11/2013 26/11/2013 26/11/2013
11 03/09/2009 26/03/2015 26/03/2015
12 12/05/2015 12/05/2015 05/12/2015
13 27/10/2018 02/04/2014 04/02/2014
14 15/02/2016 15/02/2016 15/02/2016
我试图返回其中Column_A
> Column_B
和Column_A
> Column_C
的记录。有关信息,最终有可能进行更多的日期字段比较。
因此在此示例中,我将返回:
Column_A Column_B Column_C
0 01/11/2010 01/07/2016 10/07/2001
1 15/11/2017 04/01/2016 20/01/2014
2 15/07/2014 14/12/2015 16/07/2007
3 27/10/2018 02/04/2014 04/02/2014
为了获得此输出,我尝试过:
IncorrectOrder = df[df['Column_A']>df['Column_B] or df['Column_A']>df['Column_C']]
但是我只返回df['Column_A']>df['Column_B]
的记录...
请让我知道我做错了什么。
谢谢
答案 0 :(得分:1)
添加()
并将or
更改为按位or
-|
:
df = df.apply(pd.to_datetime, dayfirst=True)
IncorrectOrder = df[(df['Column_A']>df['Column_B']) | ( df['Column_A']>df['Column_C'])]
print (IncorrectOrder)
Column_A Column_B Column_C
0 2010-11-01 2016-07-01 2001-07-10
3 2017-11-15 2016-01-04 2014-01-20
8 2014-07-15 2015-12-14 2007-07-16
13 2018-10-27 2014-04-02 2014-02-04
如果可能的话,多列:
IncorrectOrder = df[(df[['Column_B', 'Column_C']].lt(df['Column_A'], axis=0)).any(axis=1)]
#all columns comapred with first
#IncorrectOrder = df[(df.iloc[:, 1:].lt(df['Column_A'], axis=0)).any(axis=1)]
print (IncorrectOrder)
Column_A Column_B Column_C
0 2010-11-01 2016-07-01 2001-07-10
3 2017-11-15 2016-01-04 2014-01-20
8 2014-07-15 2015-12-14 2007-07-16
13 2018-10-27 2014-04-02 2014-02-04
详细信息:首先比较DataFrame.lt
与<
的列:
print (df[['Column_B', 'Column_C']].lt(df['Column_A'], axis=0))
Column_B Column_C
0 False True
1 False False
2 False False
3 True True
4 False False
5 False False
6 False False
7 False False
8 False True
9 False False
10 False False
11 False False
12 False False
13 True True
14 False False
然后通过DataFrame.any
检查每行是否至少有一个True
:
print ((df[['Column_B', 'Column_C']].lt(df['Column_A'], axis=0)).any(axis=1))
0 True
1 False
2 False
3 True
4 False
5 False
6 False
7 False
8 True
9 False
10 False
11 False
12 False
13 True
14 False
dtype: bool