Question

我有一个sql文件，其中包含我读入pandas的数据。

df = pandas.read_sql('Database count details', con=engine,
                     index_col='id', parse_dates='newest_available_date')

输出

id       code   newest_date_available
9793708  3514   2015-12-24
9792282  2399   2015-12-25
9797602  7452   2015-12-25
9804367  9736   2016-01-20
9804438  9870   2016-01-20

下一行代码是获取上周的日期

date_before = datetime.date.today() - datetime.timedelta(days=7) # Which is 2016-01-20

我要做的是，将date_before与df进行比较并打印出小于date_before的所有行

if (df['newest_available_date'] < date_before): print(#all rows)

显然这会给我一个错误The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

我该怎么做？

Answer 1

我会做一个像：

的面具

a = df[df['newest_date_available'] < date_before]

如果date_before = datetime.date(2016, 1, 19)，则返回：

        id  code newest_date_available
0  9793708  3514            2015-12-24
1  9792282  2399            2015-12-25
2  9797602  7452            2015-12-25

Answer 2

使用datetime.date(2019, 1, 10)之所以有效，是因为pandas将日期强制转换为内部的日期时间。但是，pandas的未来版本将不再是这种情况。

从0.24版开始，它现在发出警告：

FutureWarning：将一系列日期时间与“ datetime.date”进行比较。当前，“ datetime.date”被强制为日期时间。在将来熊猫不会胁迫，并且会引发TypeError。

更好的解决方案是在its official documentation上提出的一种解决方案，以 Pandas代替python datetime.datetime对象。

要提供引用OP初始数据集的示例，请按以下方式使用它：

import pandas
cond1 = df.newest_date_available < pd.Timestamp(2016,1,10)
df.loc[cond1, ]

pandas过滤和比较日期

2 个答案: