根据我的理解,scikit-learn接受(n-sample,n-feature)格式的数据,这是一个2D数组。假设我有表格中的数据......
Stock prices indicator1 indicator2
2.0 123 1252
1.0 .. ..
.. . .
.
如何导入?
答案 0 :(得分:56)
numpy loadtxt的一个非常好的替代方法是read_csv from Pandas。数据被加载到Pandas数据框中,其最大优点是可以处理混合数据类型,例如某些列包含文本,其他列包含数字。然后,您可以轻松选择数字列并转换为as_matrix的numpy数组。熊猫也会read/write excel files and a bunch of other formats。
如果我们有一个名为“mydata.csv”的csv文件:
point_latitude,point_longitude,line,construction,point_granularity
30.102261, -81.711777, Residential, Masonry, 1
30.063936, -81.707664, Residential, Masonry, 3
30.089579, -81.700455, Residential, Wood , 1
30.063236, -81.707703, Residential, Wood , 3
30.060614, -81.702675, Residential, Wood , 1
这将读入csv并将数字列转换为scikit_learn的numpy数组,然后修改列的顺序并将其写入excel电子表格:
import numpy as np
import pandas as pd
input_file = "mydata.csv"
# comma delimited is the default
df = pd.read_csv(input_file, header = 0)
# for space delimited use:
# df = pd.read_csv(input_file, header = 0, delimiter = " ")
# for tab delimited use:
# df = pd.read_csv(input_file, header = 0, delimiter = "\t")
# put the original column names in a python list
original_headers = list(df.columns.values)
# remove the non-numeric columns
df = df._get_numeric_data()
# put the numeric column names in a python list
numeric_headers = list(df.columns.values)
# create a numpy array with the numeric values for input into scikit-learn
numpy_array = df.as_matrix()
# reverse the order of the columns
numeric_headers.reverse()
reverse_df = df[numeric_headers]
# write the reverse_df to an excel spreadsheet
reverse_df.to_excel('path_to_file.xls')
答案 1 :(得分:51)
这不是CSV文件;这只是一个空格分隔的文件。假设没有缺失值,您可以轻松地将其加载到名为data
的Numpy数组中
import numpy as np
f = open("filename.txt")
f.readline() # skip the header
data = np.loadtxt(f)
如果股票价格是您想要预测的(您的y
值,则使用scikit-learn术语),那么您应该使用
data
X = data[:, 1:] # select columns 1 through end
y = data[:, 0] # select column 0, the stock price
或者,您也可以按standard Python csv
module来处理此类文件。
答案 2 :(得分:17)
答案 3 :(得分:1)
numpy
加载csvfile import numpy as np
dataset = np.loadtxt('./example.csv', delimiter=',')