如何解决python numpy日期和时间ValueError

时间:2017-12-16 16:52:18

标签: python csv numpy scikit-learn

我有这些代码行:

import numpy as np
f = open("scikit.csv")
f.readline() # skip the header
data = np.loadtxt(f, delimiter=",")

前5行scikit.csv,如下:

Date,Time,CPUUtilization_Average,CPUUtilization_Target,NetworkIn_Average,NetworkIn_Target,NetworkOut_Average,NetworkOut_Target,MemoryUtilization_Average,MemoryUtilization_Target,Final_Target,Final_Class
2017-12-07,16:55:00,17.0,low,0.0,low,0.0,low,5.47756198097301,low,10.312904877501694,low
2017-12-07,16:56:00,11.0,low,0.0,low,0.0,low,34.1503819977678,low,22.492003834477718,low
2017-12-07,16:57:00,3.0,low,0.0,low,0.0,low,34.2944535011255,low,19.045624937577248,low
2017-12-07,16:58:00,2.0,low,0.0,low,0.0,low,34.2875445714863,low,18.601948615438673,low

我想使用sklearn个库,因为我的数据集是外部的,所以我尝试使用与this (Loading from external datasets)指南相关的numpy
但是当我想在np.loadtxt("scikit.csv", delimiter=",")中使用numpy将我的CSV转换为Python对象时,我会收到错误。

它会显示此错误:

ValueError: could not convert string to float: Date

如果我改变这样的代码:

data = np.loadtxt(f, delimiter="," ,dtype="datetime64")

它会在time列上显示另一个错误,如下:

ValueError: Error parsing datetime string "Date" at position 0

您能指导我如何在DateTime列上解决此问题吗?
没有使用numpy的力量。

1 个答案:

答案 0 :(得分:0)

使用numpy的方式:

import time

import numpy as np
import dateutil.parser as dparser

print('option 1')
print(np.loadtxt('scikit.csv', dtype='object', skiprows=1, delimiter=',',
          converters={0: np.datetime64,
                              1: dparser.parse}))

print('option 2')
print(np.loadtxt('scikit.csv', dtype='object', skiprows=1, delimiter=',',
          converters={0: np.datetime64,
                              1: lambda t: time.strptime(t.decode('utf-8'),
                                                         '%H:%M:%S')}))

print('option 3')
print(np.loadtxt('scikit.csv', dtype='object', skiprows=1, delimiter=','))

使用熊猫:

import pandas as pd
df = pd.read_csv('scikit.csv', parse_dates=['Date', 'Time'])
                 #converters={'Date': np.datetime64, 'Time': lambda t: pd.to_datetime(t, format='%H:%M:%S')})