我有这些代码行:
import numpy as np
f = open("scikit.csv")
f.readline() # skip the header
data = np.loadtxt(f, delimiter=",")
前5行scikit.csv
,如下:
Date,Time,CPUUtilization_Average,CPUUtilization_Target,NetworkIn_Average,NetworkIn_Target,NetworkOut_Average,NetworkOut_Target,MemoryUtilization_Average,MemoryUtilization_Target,Final_Target,Final_Class
2017-12-07,16:55:00,17.0,low,0.0,low,0.0,low,5.47756198097301,low,10.312904877501694,low
2017-12-07,16:56:00,11.0,low,0.0,low,0.0,low,34.1503819977678,low,22.492003834477718,low
2017-12-07,16:57:00,3.0,low,0.0,low,0.0,low,34.2944535011255,low,19.045624937577248,low
2017-12-07,16:58:00,2.0,low,0.0,low,0.0,low,34.2875445714863,low,18.601948615438673,low
我想使用sklearn
个库,因为我的数据集是外部的,所以我尝试使用与this (Loading from external datasets)指南相关的numpy
。
但是当我想在np.loadtxt("scikit.csv", delimiter=",")
中使用numpy将我的CSV转换为Python对象时,我会收到错误。
它会显示此错误:
ValueError: could not convert string to float: Date
如果我改变这样的代码:
data = np.loadtxt(f, delimiter="," ,dtype="datetime64")
它会在time
列上显示另一个错误,如下:
ValueError: Error parsing datetime string "Date" at position 0
您能指导我如何在Date
和Time
列上解决此问题吗?
没有使用numpy
的力量。
答案 0 :(得分:0)
使用numpy的方式:
import time
import numpy as np
import dateutil.parser as dparser
print('option 1')
print(np.loadtxt('scikit.csv', dtype='object', skiprows=1, delimiter=',',
converters={0: np.datetime64,
1: dparser.parse}))
print('option 2')
print(np.loadtxt('scikit.csv', dtype='object', skiprows=1, delimiter=',',
converters={0: np.datetime64,
1: lambda t: time.strptime(t.decode('utf-8'),
'%H:%M:%S')}))
print('option 3')
print(np.loadtxt('scikit.csv', dtype='object', skiprows=1, delimiter=','))
使用熊猫:
import pandas as pd
df = pd.read_csv('scikit.csv', parse_dates=['Date', 'Time'])
#converters={'Date': np.datetime64, 'Time': lambda t: pd.to_datetime(t, format='%H:%M:%S')})