ValueError:找到样本数不一致的数组[1,299]

时间:2016-02-07 08:09:14

标签: python numpy pandas machine-learning scikit-learn

以下是数据文件herehere。您可以通过单击链接链接下载它。我正在使用Pandas,Numpy和Python3。

这是我的代码:

import pandas as pa
import numpy as nu
from sklearn.linear_model import Perceptron
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler

def get_accuracy(X_train, y_train, X_test, y_test):
    perceptron = Perceptron()
    perceptron.fit(X_train, y_train)
    perceptron.transform(X_train)
    prediction = perceptron.predict(X_test)
    result = accuracy_score(y_test, prediction)
    return result

test_data = pa.read_csv("C:/Users/Roman/Downloads/perceptron-test.csv")
test_data.columns = ["class", "f1", "f2"]
train_data = pa.read_csv("C:/Users/Roman/Downloads/perceptron-train.csv")
train_data.columns = ["class", "f1", "f2"]

scaler = StandardScaler()
scaler.fit_transform(train_data[train_data.columns[1:]]).reshape(-1,1)
X_train = scaler.transform(train_data[train_data.columns[1:]])

scaler.fit_transform(train_data[train_data.columns[0]])
y_train = scaler.transform(train_data[train_data.columns[0]])

scaler.fit_transform(test_data[test_data.columns[1:]])
X_test = scaler.transform(test_data[test_data.columns[1:]])

scaler.fit_transform(test_data[test_data.columns[0]])
y_test = scaler.transform(test_data[test_data.columns[0]])




scaled_accuracy = get_accuracy(nu.ravel(X_train), nu.ravel(y_train),    nu.ravel(X_test), nu.ravel(y_test))
print(scaled_accuracy)

这是我得到的错误:

Traceback (most recent call last):
  File "C:/Users/Roman/PycharmProjects/data_project-1/lecture_2_perceptron.py", line 33, in <module>
    scaled_accuracy = get_accuracy(nu.ravel(X_train), nu.ravel(y_train), nu.ravel(X_test), nu.ravel(y_test))
  File "C:/Users/Roman/PycharmProjects/data_project-1/lecture_2_perceptron.py", line 9, in get_accuracy
    perceptron.fit(X_train, y_train)
  File "C:\Users\Roman\AppData\Roaming\Python\Python35\site-packages\sklearn\linear_model\stochastic_gradient.py", line 545, in fit
    sample_weight=sample_weight)
  File "C:\Users\Roman\AppData\Roaming\Python\Python35\site-packages\sklearn\linear_model\stochastic_gradient.py", line 389, in _fit
    X, y = check_X_y(X, y, 'csr', dtype=np.float64, order="C")
  File "C:\Users\Roman\AppData\Roaming\Python\Python35\site-packages\sklearn\utils\validation.py", line 520, in check_X_y
    check_consistent_length(X, y)
  File "C:\Users\Roman\AppData\Roaming\Python\Python35\site-packages\sklearn\utils\validation.py", line 176, in check_consistent_length
    "%s" % str(uniques))
**ValueError: Found arrays with inconsistent numbers of samples: [  1 299]**

如果没有扩展数据,一切正常。但是在缩放之后没有。

1 个答案:

答案 0 :(得分:0)

每次使用缩放器时都不应调用fit_transform。您应该fit一次,在训练数据上,以后只有transform,否则您将获得不同的训练和测试表示(导致提供错误)。缩放标签也没有意义。