我正在运行用于标准化输入,运行PCA,对PCA因子进行标准化的管道,然后最终运行逻辑回归。
但是,我在产生的混淆矩阵上得到了可变的结果。
我发现,如果删除第三步(“ normalise_pca”),我的结果将保持不变。
我已经为所有可以执行的流水线步骤设置了random_state = 0。知道为什么我得到可变的结果吗?
def exp2_classifier(X_train, y_train):
estimators = [('robust_scaler', RobustScaler()),
('reduce_dim', PCA(random_state=0)),
('normalise_pca', PowerTransformer()), #I applied this as the distribution of the PCA factors were skew
('clf', LogisticRegression(random_state=0, solver="liblinear"))]
#solver specified here to suppress warnings, it doesn't seem to effect gridSearch
pipe = Pipeline(estimators)
return pipe
exp2_eval = Evaluation().print_confusion_matrix
logit_grid = Experiment().run_experiment(asdp.data, "heavy_drinker", exp2_classifier, exp2_eval);
答案 0 :(得分:1)
我无法重现您的错误。我尝试了sklearn的其他示例数据集,但多次运行得到了一致的结果。因此,差异可能不是由于normalize_pca
from sklearn import datasets
from sklearn.metrics import confusion_matrix
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import RobustScaler,PowerTransformer
from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression
cancer = datasets.load_breast_cancer()
X = cancer.data
y = cancer.target
from sklearn.model_selection import train_test_split
X_train, X_eval, y_train, y_eval = train_test_split(X, y, test_size=0.2, random_state=42)
estimators = [('robust_scaler', RobustScaler()),
('reduce_dim', PCA(random_state=0)),
('normalise_pca', PowerTransformer()), #I applied this as the distribution of the PCA factors were skew
('clf', LogisticRegression(random_state=0, solver="liblinear"))]
#solver specified here to suppress warnings, it doesn't seem to effect gridSearch
pipe = Pipeline(estimators)
pipe.fit(X_train,y_train)
print('train data :')
print(confusion_matrix(y_train,pipe.predict(X_train)))
print('test data :')
print(confusion_matrix(y_eval,pipe.predict(X_eval)))
输出:
train data :
[[166 3]
[ 4 282]]
test data :
[[40 3]
[ 3 68]]