如何增加Sklearn PCA的Logistic回归得分?

时间:2019-06-03 00:10:44

标签: python machine-learning scikit-learn deep-learning pca

我想在图像识别方面对Lenet和PCA进行比较,因此我使用了德国交通信号基准和Sklearn PCA模块,但是当我使用Logistic回归对其进行测试时,得分并未高于6%,无论我尝试了什么。

我尝试修改交互次数和预处理次数(使用归一化和均等化),但仍然无法正常工作

Pickle通过以下三个档案加载文件:

\b(?<PreRef>[^\W\d]+):(?<Ref>\d{5})

每个标签都有其标签,如y_train,y_test和y_valid所写。 这是代码的相关部分:

train.p, with shape of (34799, 32, 32, 3)
test.p, with shape of (12630, 32, 32, 3)
valid.p, with shape of (4410, 32, 32, 3)

结果如下:

def gray_scale(image):
    """
    Convert images to gray scale.
        Parameters:
            image: An np.array compatible with plt.imshow.
    """
    return cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)

def preprocess2(data):

    n_training = data.shape
    gray_images = np.zeros((n_training[0], n_training[1], n_training[2]))
    for i, img in enumerate(data):
        gray_images[i] = gray_scale(img)
    gray_images = gray_images[..., None]
    return gray_images

from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

pca = PCA(0.95)

X_train_preprocess = preprocess2(X_train)
#Removing one dimension (34799,32,32,1) to (34799,32,32)
X_train_preprocess = X_train_preprocess.reshape(34799,32,32)
nsamples, nx, ny = X_train_preprocess.shape
X_train_preprocess = X_train_preprocess.reshape((nsamples,nx*ny))

X_test_preprocess = preprocess2(X_test)
#Removing one dimension (34799,32,32,1) to (12630,32,32)
X_test_preprocess = X_test_preprocess.reshape(12630,32,32) 
n2samples, n2x, n2y = X_test_preprocess.shape
X_test_preprocess = X_test_preprocess.reshape((n2samples,n2x*n2y))

print(X_train_preprocess.shape)
pca.fit(X_train_preprocess)
print(pca.n_components_)
scaler = StandardScaler()
scaler.fit(X_train_preprocess)
X_t_train = scaler.transform(X_train_preprocess)
X_t_test = scaler.transform(X_test_preprocess)

X_t_train = pca.transform(X_t_train)
X_t_test = pca.transform(X_t_test)

from sklearn.linear_model import LogisticRegression
logisticRegr = LogisticRegression(solver = 'lbfgs', max_iter = 5000)
logisticRegr.fit(X_t_train, y_train)
print('score', logisticRegr.predict(X_t_test[0:10]))
print('score', logisticRegr.score(X_t_test, y_test))

所以我想看看你们是否可以使我了解我在做错什么,以及我该怎么做才能使这项工作正常进行

1 个答案:

答案 0 :(得分:0)

您在图像识别中获得了2d数据,最好使用cnn网络来呈现高维关系

相关链接:Training CNN with images in sklearn neural net