我有一个数据集,可以将其标记为1-4类,或者标记为离群值/离群值,例如
我将LDA()
用作管道的一部分,以查找使四个类之间的距离最大化的维度,但后来我希望使用离群值/离群值(-1/1)标签作为{{ 1}}。
目的是执行GridSearch,所以除非我弄错了,否则我将在一个管道中同时需要预处理步骤和分类器吗?
这是一个可复制的示例:
OneClassSVM()
哪个出现以下错误:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split
from sklearn.decomposition import PCA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.svm import OneClassSVM as OCSVM
import pandas as pd
# create four different clusters
X, y1 = make_blobs(centers = 4, cluster_std=2, n_samples=1000, n_features=10, random_state=2)
class_dict = {0:-1, 1:1, 2:-1, 3:-1} # define a dictionary to map original class labels to inlier / outlier
y2 = list(map(class_dict.get, y1)) # use dictionary
pipe = Pipeline([('scaler', StandardScaler()),
('lda', LDA(n_components=3, solver="eigen", shrinkage="auto")),
('lda_scaler', StandardScaler()),
('ocsvm', OCSVM(kernel="rbf", nu=0.1, gamma="auto"))
])
pipe.fit(X, y2, ocsvm__y=y1)
我认为所发生的是y在fit函数(即---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
in
24 ])
25
---> 26 pipe.fit(X, y2, ocsvm__y=y1)
27
28
~\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\pipeline.py in fit(self, X, y, **fit_params)
352 self._log_message(len(self.steps) - 1)):
353 if self._final_estimator != 'passthrough':
--> 354 self._final_estimator.fit(Xt, y, **fit_params)
355 return self
356
TypeError: fit() got multiple values for argument 'y'
)中定义了两次,但是我不知道如何解决它。有什么想法吗?