我正在按日期过滤数据帧以产生两个单独的版本:
但是,当我尝试过滤日期时,似乎错过了过去两年内的日期。
date_format = '%m-%d-%Y' # desired date format
today = dt.now().strftime(date_format) # today's date. Will always result in today's date
today = dt.strptime(today, date_format).date() # converting 'today' into a datetime object
today = today.strftime(date_format)
two_years = today - relativedelta(years=2) # date is today's date minus two years.
two_years = two_years.strftime(date_format)
# normalizing the format of the date column to the desired format
df_data['date'] = pd.to_datetime(df_data['date'], errors='coerce').dt.strftime(date_format)
df_today = df_data[df_data['date'] == today]
df_two_year = df_data[df_data['date'] >= two_years]
这将导致:
all dates ['07-17-2020' '07-15-2020' '08-01-2019' '03-25-2015']
today df ['07-17-2020']
two year df ['07-17-2020' '08-01-2019']
即使已捕获2019年8月1日,两年中也缺少2020年7月15日的日期。
答案 0 :(得分:0)
您的数据类型转换是这里的问题。您可以这样做:
today = dt.now() # today's date. Will always result in today's date
two_years = today - relativedelta(years=2) # date is today's date minus two years.
这将打印'2018-07-17 18:40:42.704395'。然后,您可以将其转换为仅日期格式。
two_years = two_years.strftime(date_format)
two_years = dt.strptime(two_years, date_format).date()
答案 1 :(得分:0)
您无需将任何内容转换为字符串,只需使用datetime dtype。例如:
import pandas as pd
df = pd.DataFrame({'date': pd.to_datetime(['07-17-2020','07-15-2020','08-01-2019','03-25-2015'])})
today = pd.Timestamp('now')
print(df[df['date'].dt.date == today.date()])
# date
# 0 2020-07-17
print(df[(df['date'].dt.year >= today.year-1) & (df['date'].dt.date != today.date())])
# date
# 1 2020-07-15
# 2 2019-08-01
从比较操作中获得的结果(根据需要进行调整...)是布尔掩码-您可以很好地使用它们来过滤df。