我正在尝试加载一个格式如下的csv文件:
40010 40015 40020 40025 40030 40035 40040 40045
2008-11-03 00:00 786 212 779 227 220 131 680 1006
2008-11-03 00:03 760 200 765 234 225 133 694 1063
2008-11-03 00:06 757 205 769 237 230 136 726 1051
2008-11-03 00:09 781 207 765 240 235 137 711 1040
2008-11-03 00:12 759 203 751 232 225 134 717 1088
...
该文件以逗号分隔。这里没有固定的宽度。
我希望行索引是日期时间,所以这是我在加载文件时正在做的事情:
def dateparse (timestamp):
return datetime.datetime.strptime(timestamp, '%Y-%m-%d %I:%M')
global_data_train = pd.read_csv('RTAHistorical.csv', sep=",",parse_dates=True, date_parser=dateparse, header=0, index_col=0, skip_blank_lines = True, engine='python')
但我收到以下错误:
TypeError: strptime() argument 1 must be str, not numpy.ndarray
正如我看到some people成功使用相同的方法,我不太明白这个错误。
我做错了什么?
答案 0 :(得分:1)
对我而言,将格式更改为%Y-%m-%d %H:%M
:
def dateparse (timestamp):
return pd.datetime.strptime(timestamp, '%Y-%m-%d %H:%M')
样品:
import pandas as pd
from pandas.compat import StringIO
temp=u"""40010,40015,40020,40025,40030,40035,40040,40045
2008-11-03 00:00,786,212,779,227,220,131,680,1006
2008-11-03 00:03,760,200,765,234,225,133,694,1063
2008-11-03 00:06,757,205,769,237,230,136,726,1051
2008-11-03 00:09,781,207,765,240,235,137,711,1040
2008-11-03 00:12,759,203,751,232,225,134,717,1088"""
#after testing replace StringIO(temp) to filename
def dateparse (timestamp):
return pd.datetime.strptime(timestamp, '%Y-%m-%d %H:%M')
global_data_train = pd.read_csv(StringIO(temp),
sep=",",
parse_dates=True,
date_parser=dateparse,
header=0,
index_col=0,
skip_blank_lines = True,
engine='python')
print (global_data_train)
40010 40015 40020 40025 40030 40035 40040 40045
2008-11-03 00:00:00 786 212 779 227 220 131 680 1006
2008-11-03 00:03:00 760 200 765 234 225 133 694 1063
2008-11-03 00:06:00 757 205 769 237 230 136 726 1051
2008-11-03 00:09:00 781 207 765 240 235 137 711 1040
2008-11-03 00:12:00 759 203 751 232 225 134 717 1088
print (global_data_train.index)
DatetimeIndex(['2008-11-03 00:00:00', '2008-11-03 00:03:00',
'2008-11-03 00:06:00', '2008-11-03 00:09:00',
'2008-11-03 00:12:00'],
dtype='datetime64[ns]', freq=None)
也可以省略date_parser=dateparse
。
import pandas as pd
from pandas.compat import StringIO
temp=u"""40010,40015,40020,40025,40030,40035,40040,40045
2008-11-03 00:00,786,212,779,227,220,131,680,1006
2008-11-03 00:03,760,200,765,234,225,133,694,1063
2008-11-03 00:06,757,205,769,237,230,136,726,1051
2008-11-03 00:09,781,207,765,240,235,137,711,1040
2008-11-03 00:12,759,203,751,232,225,134,717,1088"""
#after testing replace StringIO(temp) to filename
global_data_train = pd.read_csv(StringIO(temp),
parse_dates=True,
skip_blank_lines = True)
print (global_data_train)
40010 40015 40020 40025 40030 40035 40040 40045
2008-11-03 00:00:00 786 212 779 227 220 131 680 1006
2008-11-03 00:03:00 760 200 765 234 225 133 694 1063
2008-11-03 00:06:00 757 205 769 237 230 136 726 1051
2008-11-03 00:09:00 781 207 765 240 235 137 711 1040
2008-11-03 00:12:00 759 203 751 232 225 134 717 1088
print (global_data_train.index)
DatetimeIndex(['2008-11-03 00:00:00', '2008-11-03 00:03:00',
'2008-11-03 00:06:00', '2008-11-03 00:09:00',
'2008-11-03 00:12:00'],
dtype='datetime64[ns]', freq=None)