我正在尝试将pandas日期列转换为YYYY-MM-DD,然后根据参数min_date限制日期。
问题是,当尝试将dd / mm / yyyy从df转换为所需格式时,pandas假定日期为美国风格,即06/07/2020转换为2020-06-07
我搜寻了stackoverflow,但是找不到遇到类似问题的人,通常的答案只是使用strftime,但这不适用于我的用例。
代码段:
import pandas as pd
from datetime import datetime
#date restriction
min_date = datetime.strptime(str("2020-06-01"),"%Y-%m-%d").date()
#convert html from website to DF
df = pd.read_html("https://webgate.ec.europa.eu/rasff-window/portal/index.cfm?event=notificationsList")[0]
#only use certain columns
df = df[["Subject","Date of case"]]
df.rename(columns={"Subject": "Description", "Date of case": "Date" }, inplace=True)
#adding another date column to easily compare the old vs new date
df["Date1"] = pd.to_datetime(df["Date"]).dt.strftime('%Y-%m-%d')
df = df[df["Date1"] > min_date.strftime('%Y-%m-%d')]
print(df[["Date","Date1","Description"]])
在此先感谢您的帮助。
答案 0 :(得分:0)
我认为您需要比较日期时间,而不是日期时间的字符串repr,还需要在to_datetime
中添加dayfirst=True
参数,并在Series.dt.strftime
中将格式更改为DD/MM/YYYY
:
#date restriction
min_date = datetime.strptime(str("2020-06-01"),"%Y-%m-%d")
#convert html from website to DF
u = "https://webgate.ec.europa.eu/rasff-window/portal/index.cfm?event=notificationsList"
df = pd.read_html(u)[0]
#only use certain columns
df = df[["Subject","Date of case"]]
df.rename(columns={"Subject": "Description", "Date of case": "Date" }, inplace=True)
#adding another date column to easily compare the old vs new date
#converted to datetimes
df["Date1"] = pd.to_datetime(df["Date"], dayfirst=True)
#compare
df = df[df["Date1"] > min_date].copy()
#or compare by string
#df = df[df["Date1"] > "2020-06-01"].copy()
#change to custom format
df["Date1"] = df["Date1"].dt.strftime('%d/%m/%Y')
print(df[["Date","Date1","Description"]])
Date Date1 Description
0 06/07/2020 06/07/2020 presence of lactose (>1 250 mg/kg - ppm) in la...
1 06/07/2020 06/07/2020 high content of vitamin B9 - folic acid (1388 ...
2 06/07/2020 06/07/2020 high content of vitamin B9 - folic acid (1189 ...
3 06/07/2020 06/07/2020 aflatoxins (B1 = 25.4 µg/kg - ppb) in groundnu...
4 06/07/2020 06/07/2020 poor temperature control (-10.2 °C) of frozen ...
.. ... ... ...
95 24/06/2020 24/06/2020 unauthorised genetically modified micro-organi...
96 24/06/2020 24/06/2020 Salmonella (in 2 out of 5 samples /25g) in pro...
97 24/06/2020 24/06/2020 horse which has not undergone sufficient withd...
98 24/06/2020 24/06/2020 unauthorised genetically modified (positive fo...
99 24/06/2020 24/06/2020 2,4-dinitrophenol (DNP) offered online for sale
[100 rows x 3 columns]