具有特殊日期格式的xls数据,例如:
start day(utc) start time(utc)
20160401 100
20160401 200
20160401 300
20160401 400
20160401 500
我想将其解析为格式2016-04-01 1:00, 我用pandas读表;
parse = lambda x: datetime.strptime(str(x), '%Y%m%d %H')
content=pd.read_excel(filepath,skiprows=1,
na_values=['nan',-9999.0,9999.0,
'-9999.0 -',-99,'-99.000 -',-999],
parse_cols=[1,2,3,4,5,6,7,8,9,10,11,12,14],
header=None, parse_dates = [0,1],
index_col = 0,
date_parser=parse)
但错误发生了。它显示:
File "D:\Anaconda2\lib\_strptime.py", line 332, in _strptime
(data_string, format))
ValueError: time data '100' does not match format '%Y%m%d'
我该如何处理?
答案 0 :(得分:0)
您可以使用to_timedelta
,因为必要除以100:
content=pd.read_excel(filepath,skiprows=1,
na_values=['nan',-9999.0,9999.0,
'-9999.0 -',-99,'-99.000 -',-999],
parse_cols=[1,2,3,4,5,6,7,8,9,10,11,12,14],
header=None, parse_dates = [0],
index_col = 0)
df.index = df.index + pd.to_timedelta(df['start time(utc)'] / 100., unit='h')
df = df.drop('start time(utc)', axis=1)
如果没有必要(小时为0,1,2..23
),请将parse_dates = [0,1]
更改为parse_dates = [[0,1]]
:
<强>示例强>:
import pandas as pd
from pandas.compat import StringIO
temp=u"""start day(utc);start time(utc);a
20160401;1;1
20160401;2;7
20160401;3;7
20160401;4;5
20160401;5;3"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
parse = lambda x: datetime.strptime(x, '%Y%m%d %H')
df = pd.read_csv(StringIO(temp), sep=";",
parse_dates = [[0,1]],
index_col = 0,
date_parser=parse)
print (df)
a
start day(utc)_start time(utc)
2016-04-01 01:00:00 1
2016-04-01 02:00:00 7
2016-04-01 03:00:00 7
2016-04-01 04:00:00 5
2016-04-01 05:00:00 3
print (df.index)
DatetimeIndex(['2016-04-01 01:00:00', '2016-04-01 02:00:00',
'2016-04-01 03:00:00', '2016-04-01 04:00:00',
'2016-04-01 05:00:00'],
dtype='datetime64[ns]', name='start day(utc)_start time(utc)', freq=None)