我正在尝试一种新的回归方法,并且一直对使用支持向量机来做这件事感兴趣。问题是我正在为它提供训练数据,以便它可以预测测试数据。当它预测训练数据时,它预测如下 -
array([ 8.00000947, 8.10000947, 7.90000947, 8.40000947, 8.50000947,
8.10000947, 7.90000947, 8.20000947, 8.40000947, 8.20000948,
8.40000948, 8.40000947, 8.00000947, 8.10000947, 8.50000948,
8.40000947, 8.60000947, 8.40000948, 8.40000948, 8.00000947,
8.50000948, 8.30000948, 7.99922823, ...
但是,当我预测测试数据时,它预测如此 -
array([ 7.92969697, 7.92969697, 7.92969697, 7.92969697, 7.92969697,
7.92969697, 7.92969697, 7.92969697, 7.92969697, 7.92969697,
7.92969697, 7.92969697, 7.92969697, ...
代码如下 -
import sklearn
import pandas as pd
import numpy as np
from sklearn import svm
df = pd.read_excel('featureset.xlsx')
df.shape
(280, 23)
cols = pd.factorize(df.columns)
X= df
X.columns = cols[0]
Y.columns = [51]
X_train = X[:196]
X_test = X[196:]
y_train = Y[:196]
y_test = Y[196:]
y_train.shape
(196L,)
y_test.shape
(84L,)
clf = svm.SVR(kernel='rbf', C=1e3, degree=2)
clf.fit(X_train, y_train).predict(X_test)
因此,导致
array([ 7.92969697, 7.92969697, 7.92969697, 7.92969697, 7.92969697, ...