我有一个从pandas导入的excel数据集。字符串格式有一列日期和时间。
16-MAR-16 11.35.27.000000000 AM
05-APR-16 05.21.14.000000000 PM
16-FEB-16 09.56.36.000000000 AM
16-MAR-16 11.35.27.000000000 AM
16-MAR-16 09.28.11.000000000 AM
19-MAY-16 03.50.38.000000000 PM
我想将这些数据分成不同的日期和时间。我经历了几个相同的问题,但找不到答案。
我试过的这段代码
(1)df["timestamp"] = pd.to_datetime(df['Invoice Date'],dayfirst = True)
(Error)File "C:\Users\admin\Anaconda3\lib\site-packages\dateutil\parser.py", line 559, in parse
raise ValueError("Unknown string format")
ValueError: Unknown string format
(2)from datetime import datetime
df["timestamp"] = df["Invoice Date"].apply(lambda x: datetime.strptime(x,"dd-mmm-yy hh.mm.ss.%f aa"))
(Error) ValueError: time data '16-MAR-16 11.35.27.000000000 AM' does not match format 'dd-mmm-yy hh.mm.ss.%f aa'
请帮帮我。
答案 0 :(得分:4)
使用参数format
,reference:
df["timestamp"] = pd.to_datetime(df['Invoice Date'], format='%d-%b-%y %H.%M.%S.%f %p')
print (df)
Invoice Date timestamp
0 16-MAR-16 11.35.27.000000000 AM 2016-03-16 11:35:27
1 05-APR-16 05.21.14.000000000 PM 2016-04-05 05:21:14
2 16-FEB-16 09.56.36.000000000 AM 2016-02-16 09:56:36
3 16-MAR-16 11.35.27.000000000 AM 2016-03-16 11:35:27
4 16-MAR-16 09.28.11.000000000 AM 2016-03-16 09:28:11
5 19-MAY-16 03.50.38.000000000 PM 2016-05-19 03:50:38
如果想要2个单独的列来表示日期和时间:
d = pd.to_datetime(df['Invoice Date'], format='%d-%b-%y %H.%M.%S.%f %p')
df['date'] = d.dt.date
df['time'] = d.dt.time
print (df)
Invoice Date date time
0 16-MAR-16 11.35.27.000000000 AM 2016-03-16 11:35:27
1 05-APR-16 05.21.14.000000000 PM 2016-04-05 05:21:14
2 16-FEB-16 09.56.36.000000000 AM 2016-02-16 09:56:36
3 16-MAR-16 11.35.27.000000000 AM 2016-03-16 11:35:27
4 16-MAR-16 09.28.11.000000000 AM 2016-03-16 09:28:11
5 19-MAY-16 03.50.38.000000000 PM 2016-05-19 03:50:38