Python - SKLearn Fit Array Error

时间:2018-03-27 13:47:50

标签: python scikit-learn

我使用sklearn和python进行数据分析相对较新,并尝试对从.csv文件加载的数据集运行一些线性回归。

我已将数据加载到train_test_split而没有任何问题,但当我尝试填写训练数据时,我收到错误ValueError: Expected 2D array, got 1D array instead: ... Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

model = lm.fit(X_train, y_train)

时出错

由于我使用这些软件包的新鲜感,我试图确定这是否是在运行回归之前未将导入的csv设置为pandas数据框的结果,或者是否与其他内容有关

我的CSV格式为:

Month,Date,Day of Week,Growth,Sunlight,Plants
7,7/1/17,Saturday,44,611,26
7,7/2/17,Sunday,30,507,14
7,7/5/17,Wednesday,55,994,25
7,7/6/17,Thursday,50,1014,23
7,7/7/17,Friday,78,850,49
7,7/8/17,Saturday,81,551,50
7,7/9/17,Sunday,59,506,29

以下是我设置回归的方法:

import numpy as np
import pandas as pd
from sklearn import linear_model
from sklearn.model_selection import train_test_split
from matplotlib import pyplot as plt


organic = pd.read_csv("linear-regression.csv")

organic.columns
Index(['Month', 'Date', 'Day of Week', 'Growth', 'Sunlight', 'Plants'], dtype='object')

# Set the depedent (Growth) and independent (Sunlight)
y = organic['Growth']
X = organic['Sunlight']

# Test train split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

print (X_train.shape, X_test.shape)
print (y_train.shape, y_test.shape)
(192,) (49,)
(192,) (49,)

lm = linear_model.LinearRegression()
model = lm.fit(X_train, y_train)

# Error pointing to an array with values from Sunlight [611, 507, 994, ...]

3 个答案:

答案 0 :(得分:1)

您只使用一个功能,因此它会告诉您在错误中执行的操作:

  

如果您的数据只有一个功能,请使用array.reshape(-1,1)重塑数据。

数据在scikit-learn中必须是2D。

(不要忘记X = organic['Sunglight'])中的拼写错误

答案 1 :(得分:1)

您只需将最后一列调整为

即可
$sql = "SELECT codice_target FROM customer";
$result = $conn->query($sql);
$arraytoclass = array();

if ($result->num_rows > 0) {
    // output data of each row
    //echo "tutto ok";
    while($row = $result->fetch_row()) {
        //echo "Codice target: " . $row["codice_target"]."<br>";
        $arraytoclass[] = $row;
        //echo "codice target:".$arraytoclass[$i]['codice_target']; 

    }print_r($arraytoclass);
} else {
    echo "0 results";
}

$conn->close();

并且模型适合。原因是sklearn的线性模型需要

  

X:numpy数组或形状稀疏矩阵[n_samples,n_features]

因此,在这个特殊情况下,我们的训练数据必须是[7,1]形式

答案 2 :(得分:0)

将数据加载到train_test_split(X, y, test_size=0.2)中后,它将返回尺寸为X_trainX_test的熊猫系列(192, )(49, )。如前面的答案中所述,sklearn期望形状为[n_samples,n_features]的矩阵作为X_trainX_test数据。您只需将Pandas系列X_trainX_test转换为Pandas数据框,即可将其尺寸更改为(192, 1)(49, 1)

lm = linear_model.LinearRegression()
model = lm.fit(X_train.to_frame(), y_train)