假设日期格式错误的熊猫日期转换

时间:2020-07-07 11:50:36

标签: python python-3.x pandas dataframe

我正在尝试将pandas日期列转换为YYYY-MM-DD,然后根据参数min_date限制日期。

问题是,当尝试将dd / mm / yyyy从df转换为所需格式时,pandas假定日期为美国风格,即06/07/2020转换为2020-06-07

我搜寻了stackoverflow,但是找不到遇到类似问题的人,通常的答案只是使用strftime,但这不适用于我的用例。

代码段:

import pandas as pd
from datetime import datetime

#date restriction
min_date = datetime.strptime(str("2020-06-01"),"%Y-%m-%d").date()

#convert html from website to DF
df = pd.read_html("https://webgate.ec.europa.eu/rasff-window/portal/index.cfm?event=notificationsList")[0]

#only use certain columns
df = df[["Subject","Date of case"]]
df.rename(columns={"Subject": "Description", "Date of case": "Date" }, inplace=True)

#adding another date column to easily compare the old vs new date
df["Date1"] = pd.to_datetime(df["Date"]).dt.strftime('%Y-%m-%d')
df = df[df["Date1"] > min_date.strftime('%Y-%m-%d')]

print(df[["Date","Date1","Description"]])

在此先感谢您的帮助。

1 个答案:

答案 0 :(得分:0)

我认为您需要比较日期时间,而不是日期时间的字符串repr,还需要在to_datetime中添加dayfirst=True参数,并在Series.dt.strftime中将格式更改为DD/MM/YYYY

#date restriction
min_date = datetime.strptime(str("2020-06-01"),"%Y-%m-%d")

#convert html from website to DF
u = "https://webgate.ec.europa.eu/rasff-window/portal/index.cfm?event=notificationsList"
df = pd.read_html(u)[0]

#only use certain columns
df = df[["Subject","Date of case"]]
df.rename(columns={"Subject": "Description", "Date of case": "Date" }, inplace=True)

#adding another date column to easily compare the old vs new date

#converted to datetimes
df["Date1"] = pd.to_datetime(df["Date"], dayfirst=True)
#compare
df = df[df["Date1"] > min_date].copy()
#or compare by string
#df = df[df["Date1"] > "2020-06-01"].copy()
#change to custom format
df["Date1"] = df["Date1"].dt.strftime('%d/%m/%Y')

print(df[["Date","Date1","Description"]])
          Date       Date1                                        Description
0   06/07/2020  06/07/2020  presence of lactose (>1 250 mg/kg - ppm) in la...
1   06/07/2020  06/07/2020  high content of vitamin B9 - folic acid (1388 ...
2   06/07/2020  06/07/2020  high content of vitamin B9 - folic acid (1189 ...
3   06/07/2020  06/07/2020  aflatoxins (B1 = 25.4 µg/kg - ppb) in groundnu...
4   06/07/2020  06/07/2020  poor temperature control (-10.2 °C) of frozen ...
..         ...         ...                                                ...
95  24/06/2020  24/06/2020  unauthorised genetically modified micro-organi...
96  24/06/2020  24/06/2020  Salmonella (in 2 out of 5 samples /25g) in pro...
97  24/06/2020  24/06/2020  horse which has not undergone sufficient withd...
98  24/06/2020  24/06/2020  unauthorised genetically modified (positive fo...
99  24/06/2020  24/06/2020    2,4-dinitrophenol (DNP) offered online for sale

[100 rows x 3 columns]