按日期过滤时,并非所有日期都被捕获。蟒蛇熊猫

时间:2020-07-17 18:27:05

标签: python pandas datetime timedelta relativedelta

我正在按日期过滤数据帧以产生两个单独的版本:

  1. 仅今天的数据
  2. 最近两年的数据

但是,当我尝试过滤日期时,似乎错过了过去两年内的日期。

date_format = '%m-%d-%Y'  # desired date format

today = dt.now().strftime(date_format)  # today's date. Will always result in today's date
today = dt.strptime(today, date_format).date()  # converting 'today' into a datetime object

today = today.strftime(date_format)
two_years = today - relativedelta(years=2)  # date is today's date minus two years. 
two_years = two_years.strftime(date_format)

# normalizing the format of the date column to the desired format 
df_data['date'] = pd.to_datetime(df_data['date'], errors='coerce').dt.strftime(date_format)

df_today = df_data[df_data['date'] == today]
df_two_year = df_data[df_data['date'] >= two_years]

这将导致:

all dates ['07-17-2020' '07-15-2020' '08-01-2019' '03-25-2015']
today df ['07-17-2020']
two year df ['07-17-2020' '08-01-2019']

即使已捕获2019年8月1日,两年中也缺少2020年7月15日的日期。

2 个答案:

答案 0 :(得分:0)

您的数据类型转换是这里的问题。您可以这样做:

today = dt.now()  # today's date. Will always result in today's date
two_years = today - relativedelta(years=2)  # date is today's date minus two years. 

这将打印'2018-07-17 18:40:42.704395'。然后,您可以将其转换为仅日期格式。

two_years = two_years.strftime(date_format)
two_years = dt.strptime(two_years, date_format).date()

答案 1 :(得分:0)

您无需将任何内容转换为字符串,只需使用datetime dtype。例如:

import pandas as pd

df = pd.DataFrame({'date': pd.to_datetime(['07-17-2020','07-15-2020','08-01-2019','03-25-2015'])})

today = pd.Timestamp('now')

print(df[df['date'].dt.date == today.date()])
#         date
# 0 2020-07-17

print(df[(df['date'].dt.year >= today.year-1) & (df['date'].dt.date != today.date())])
#         date
# 1 2020-07-15
# 2 2019-08-01

从比较操作中获得的结果(根据需要进行调整...)是布尔掩码-您可以很好地使用它们来过滤df。