我正在尝试最新版本的numpy 2.0 dev:
np.__version__
Out[44]: '2.0.0.dev-aded70c'
我正在尝试导入如下所示的CSV数据:
date,system,pumping,rgt,agt,sps,eskom_import,temperature,wind,pressure,weather
2007-01-01 00:30,481.9,,,,,481.9,15,SW,1040,Fine
2007-01-01 01:00,471.9,,,,,471.9,15,SW,1040,Fine
2007-01-01 01:30,455.9,,,,,455.9,,,,
等
使用以下代码:
convertdict = {0: lambda s: np.datetime64(s, 'm'), 1: lambda s: float(s or 0), 2: lambda s: float(s or 0), 3: lambda s: float(s or 0), 4: lambda s: float(s or 0), 5: lambda s: float(s or 0), 6: lambda s: float(s or 0), 7: lambda s: float(s or 0), 8: str, 9: str, 10: str}
dt = [('date', np.datetime64),('system', float), ('pumping', float),('rgt',
float), ('agt', float), ('sps', float) ,('eskom_import', float),('temperature', float), ('wind', str), ('pressure', float), ('weather', str)]
a = np.recfromcsv(fp, dtype=dt, converters=convertdict, usecols=range(0-11),
names=True)
它为a.date生成的dtype是'object':
array([2007-01-01T00:30+0200, 2007-01-01T01:00+0200, 2007-01-01T01:30+0200,
..., 2007-12-31T23:00+0200, 2007-12-31T23:30+0200,
2008-01-01T00:00+0200], dtype=object)
但我需要它是datetime64,就像在这个例子中一样(但包括hrs和 分钟):
array(['2011-07-11', '2011-07-12', '2011-07-13', '2011-07-14',
'2011-07-15', '2011-07-16', '2011-07-17'], dtype='datetime64[D]')
CSV导入似乎为“日期”而不是datetime64数据类型创建了嵌入对象日期类型。关于如何解决这个问题的任何想法?
林
答案 0 :(得分:1)
我认为避免泛型'对象'类型的技巧是避免使用recfromcsv函数。手动读取数据文件并解析信息会产生请求的dtype='datetime64[m]'
import numpy as np
dt = np.dtype([ ('date', '<M8[m]'),
('system', '<f8'),
('pumping', '<f8'),
('rgt', '<f8'),
('agt', '<f8'),
('sps', '<f8'),
('eskom_import','<f8'),
('temperature', '<f8'),
('wind', np.str),
('pressure', '<f8'),
('weather', np.str) ])
numfields = len(dt.fields.keys())
data = np.zeros(numlines, dtype=dt)
fid = open('data.csv', 'rb')
count = 0
try:
fieldnames = fid.readline().strip().split(',') #Header
for line in fid:
parsedline = line.strip().split(',')
data['date'][count] = np.datetime64(parsedline[0], 'm')
data['system'][count] = np.double(parsedline[1])
data['pumping'][count] = np.double(parsedline[2])
data['rgt'][count] = np.double(parsedline[3])
data['agt'][count] = np.double(parsedline[4])
data['sps'][count] = np.double(parsedline[5])
data['eskom_import'][count] = np.double(parsedline[6])
data['temperature'][count] = np.double(parsedline[7])
data['wind'][count] = np.str(parsedline[8])
data['pressure'][count] = np.double(parsedline[9])
data['weather'][count] = np.str(parsedline[10])
count += 1
finally:
fid.close()
>>> data['date']
array(['2007-01-01T00:30-0500', '2007-01-01T01:00-0500',
'2007-01-01T00:30-0500', '2007-01-01T01:00-0500',
'2007-01-01T00:30-0500', '2007-01-01T01:00-0500',
'2007-01-01T00:30-0500', '2007-01-01T01:00-0500'], dtype='datetime64[m]')
您可以通过使用“convertdict”并迭代解析线来改进此代码,但想法是一样的。