Question

我有一个数据集，可以将其标记为1-4类，或者标记为离群值/离群值，例如

第1类（异常值）
第2类（内部）
第3类（异常值）
第4类（异常值）

我将LDA()用作管道的一部分，以查找使四个类之间的距离最大化的维度，但后来我希望使用离群值/离群值（-1/1）标签作为{{ 1}}。

目的是执行GridSearch，所以除非我弄错了，否则我将在一个管道中同时需要预处理步骤和分类器吗？

这是一个可复制的示例：

OneClassSVM()

哪个出现以下错误：

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split
from sklearn.decomposition import PCA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.svm import OneClassSVM as OCSVM
import pandas as pd


# create four different clusters 
X, y1 = make_blobs(centers = 4, cluster_std=2, n_samples=1000, n_features=10, random_state=2)

class_dict = {0:-1, 1:1, 2:-1, 3:-1} # define a dictionary to map original class labels to inlier / outlier

y2 = list(map(class_dict.get, y1)) # use dictionary

pipe = Pipeline([('scaler', StandardScaler()), 
                 ('lda', LDA(n_components=3, solver="eigen", shrinkage="auto")),
                 ('lda_scaler', StandardScaler()),
                 ('ocsvm', OCSVM(kernel="rbf", nu=0.1, gamma="auto"))
                ])

pipe.fit(X, y2, ocsvm__y=y1)

我认为所发生的是y在fit函数（即--------------------------------------------------------------------------- TypeError Traceback (most recent call last) in 24 ]) 25 ---> 26 pipe.fit(X, y2, ocsvm__y=y1) 27 28 ~\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\pipeline.py in fit(self, X, y, **fit_params) 352 self._log_message(len(self.steps) - 1)): 353 if self._final_estimator != 'passthrough': --> 354 self._final_estimator.fit(Xt, y, **fit_params) 355 return self 356 TypeError: fit() got multiple values for argument 'y'）中定义了两次，但是我不知道如何解决它。有什么想法吗？

在管道中使用多个目标

0 个答案: