如何使用pca的矩阵结果来训练模型?

时间:2017-02-10 04:46:09

标签: python-3.x scikit-learn pca

你好我想减少我的列车矩阵的维度然后使用支持向量机,我的代码如下:

from sklearn.decomposition import PCA

首先,我尝试了执行pca:

pca = PCA(n_components=100)
#pca.fit(train_matrix)
train_matrix = np.concatenate([cities,state_matrix,work_type,company_matrix,seg,ag,rep], axis=1)

然后我将它分配给一个变量,然后训练我的模型如下:

train_matrix = pca.fit_transform(train_matrix)


from sklearn.ensemble import RandomForestClassifier
from sklearn import preprocessing
X_train, X_test, y_train, y_test = train_test_split(
    pca, labels_list, test_size=0.1, random_state=47)

但是我不确定我有什么问题,所以我想收到支持以克服这种情况:

state shape:  (282521, 572)
work type shape:  (282521, 164)
train matrix shape (5000, 100)
Traceback (most recent call last):
  File "build_model.py", line 61, in <module>
    pca, labels_list, test_size=0.1, random_state=47)
  File "/usr/local/lib/python3.5/dist-packages/sklearn/cross_validation.py", line 2039, in train_test_split
    arrays = indexable(*arrays)
  File "/usr/local/lib/python3.5/dist-packages/sklearn/utils/validation.py", line 206, in indexable
    check_consistent_length(*result)
  File "/usr/local/lib/python3.5/dist-packages/sklearn/utils/validation.py", line 177, in check_consistent_length
    lengths = [_num_samples(X) for X in arrays if X is not None]
  File "/usr/local/lib/python3.5/dist-packages/sklearn/utils/validation.py", line 177, in <listcomp>
    lengths = [_num_samples(X) for X in arrays if X is not None]
  File "/usr/local/lib/python3.5/dist-packages/sklearn/utils/validation.py", line 126, in _num_samples
    " a valid collection." % x)
TypeError: Singleton array array(PCA(copy=True, iterated_power='auto', n_components=100, random_state=None,
  svd_solver='auto', tol=0.0, whiten=False), dtype=object) cannot be considered a valid collection.

1 个答案:

答案 0 :(得分:1)

您正在向pca发送train_test_split。检查参数here 将转换后的数据(train_matrix)发送到其中。

正确的代码应该是:

X_train, X_test, y_train, y_test = train_test_split(
    train_matrix, labels_list, test_size=0.1, random_state=47)