ValueError:预期的2D数组,而是1D数组:

时间:2018-07-03 08:41:00

标签: python-3.x scikit-learn linear-regression

在练习简单线性回归模型时,我遇到了这个错误, 我认为我的数据集有问题。

Here is my data set:

Here is independent variable X:

Here is dependent variable Y:

Here is X_train

Here Is Y_train

这是错误正文:

For o = 72 To lastrow 

    Dim refqst As String
    refqst = wss.Cells(o, 1).Value
    If Not refqst = "" Then
        If InStr(refqst, ".") > 0 Then
            Dim Findrange As Range
            Dim newqst As String
            Dim newqstadrs As Range
            Dim lstqstadrs As Range

            If newqst = refqst Then
                Set newqstadrs = Findrange.FindNext(after:=lstqstadrs)
            Else

                Select Case Left(refqst, 1)
                    Case 1
                        Set Findrange = wsa.Range(wsa.Cells(4, gewaskolom), wsa.Cells(11, gewaskolom))
                    'some more cases here
                End Select
                Set newqstadrs = Findrange.Find(refqst, LookIn:=xlValues, lookat:=xlWhole)
            End If

            If newqstadrs Is Nothing Then
            Else
                newqst = newqstadrs.Value
                Dim newrow As Long
                newrow = Findrange.Find(refqst, LookIn:=xlValues, lookat:=xlWhole).row
                Dim lstqst As String

                If Not wsa.Cells(newrow, 1) = "" Then
                    'do some stuff         
                    lstqst = refqst
                    Set lstqstadrs = newqstadrs

                ElseIf Not wsa.Cells(newrow, 2) = "" Then

                    Dim FindRangeTwo As Range
                    Set FindRangeTwo = wsa.Range(wsa.Cells(newrow, gewaskolom), wsa.Cells(wsa.Range("B" & newrow).End(xlDown).row, gewaskolom))
                    Dim SearchRange As Range
                    Set SearchRange = wss.Range(wss.Cells(o + 1, 1), wss.Cells(wss.Range("B" & o).End(xlDown).row, 1))
                    Dim rCell As Range
                    For Each rCell In SearchRange
                        Dim newrowtwo As Long
                        newrowtwo = FindRangeTwo.Find(rCell.Value, LookIn:=xlValues, lookat:=xlWhole).row
                        'do some more stuff
                    Next rCell
                    lstqst = refqst
                    Set lstqstadrs = newqstadrs
                End If
            End If            

        End If
    End If
Next o

这是我的代码:

ValueError: Expected 2D array, got 1D array instead:
array=[ 7.   8.4 10.1  6.5  6.9  7.9  5.8  7.4  9.3 10.3  7.3  8.1].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

谢谢

7 个答案:

答案 0 :(得分:2)

如果您查看LinearRegression of scikit-learn的文档。

  

fit(X,y,sample_weight = None)

     

X:形状为[n_samples,n_features]的numpy数组或稀疏矩阵

     

预测(X)

     

X:{类似数组的稀疏矩阵},形状=(n_samples,n_features)

您可以看到X有2个维度,而x_trainx_test显然有1个维度。 根据建议,添加:

x_train = x_train.reshape(-1, 1)
x_test = x_test.reshape(-1, 1)

在拟合和预测模型之前。

答案 1 :(得分:1)

您需要同时给fitpredict方法提供2D数组。您的x_trainy_trainx_test当前仅为一维。控制台建议的内容应该可以工作:

x_train= x_train.reshape(-1, 1)
y_train= y_train.reshape(-1, 1)
x_test = x_test.reshape(-1, 1)

这使用numpy的reshape。过去已经回答了有关reshape的问题,例如应回答reshape(-1,1)的含义:What does -1 mean in numpy reshape?

答案 2 :(得分:0)

这是你的答案。
使用:left: 50%; transform: translateX(-50%);

会有所帮助。

答案 3 :(得分:0)

这是解决方案

regressor.predict([[x_test]])

对于多项式回归:

regressor_2.predict(poly_reg.fit_transform([[x_test]]))

答案 4 :(得分:0)

我建议在进行训练和测试数据集拆分之前首先重塑X:

import pandas as pd
import matplotlib as pt

#import data set

dataset = pd.read_csv('Sample-data-sets-for-linear-regression1.csv')
x = dataset.iloc[:, 1].values
y = dataset.iloc[:, 2].values
# Here is the trick
x = x.reshape(-1,1)

#Spliting the dataset into Training set and Test Set
from sklearn.cross_validation import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size= 0.2, random_state=0)

#linnear Regression

from sklearn.linear_model import LinearRegression

regressor = LinearRegression()
regressor.fit(x_train,y_train)

y_pred = regressor.predict(x_test)

答案 5 :(得分:0)

这是我用的

X_train = X_train.values.reshape(-1, 1)
y_train = y_train.values.reshape(-1, 1)
X_test = X_test.values.reshape(-1, 1)
y_test = y_test.values.reshape(-1, 1)

答案 6 :(得分:0)

很多时候在做线性回归问题的时候,人们喜欢想象这个图

one variable input linear regression

在输入上,我们有一个 X = [1,2,3,4,5]

然而,许多回归问题都有多维输入。考虑对房价的预测。这不是决定房价的一种属性。它具有多种功能(例如:房间数量、位置等)

如果你查看文档,你会看到这个 screenshot from documentation

它告诉我们行由样本组成,而列由特征组成。

Description of Input

但是,考虑一下当他将一个特征作为我们的输入时会发生什么。然后我们需要一个 n x 1 维的输入,其中 n 是样本数,第 1 列代表我们唯一的特征。

为什么 array.reshape(-1, 1) 建议有效? -1 表示根据提供的列数选择有效的行数。请参阅图像以了解它在输入中的变化。 Transformation using array.reshape