我读到,由于LLE歧管比PCA慢得多,因此明智的做法是使用PCA,然后应用LLE以减小更大的尺寸。
我想缩小mnist数据集中的数字尺寸(从28X28开始)。这是我的代码:
import matplotlib.pyplot as plt
import matplotlib as mpl
def showdigit(digit):
plt.imshow(digit, cmap = mpl.cm.binary, interpolation="nearest")
plt.axis("off")
some_digit = X[10000]
some_digit = some_digit.reshape(28, 28)
showdigit(some_digit)
plt.show()
(显示原始数字的图像。)
#PCA
from sklearn.decomposition import PCA
pca = PCA(n_components=0.95)
X_reduced = pca.fit_transform(some_digit)
showdigit(X_reduced)
我得到(26,8)X_reduced的图像。
现在,我要将LLE歧管应用于PCA的输出:
#Manifold
from sklearn.manifold import LocallyLinearEmbedding
lle = LocallyLinearEmbedding(n_components=20, n_neighbors=10)
X_redu = lle.fit_transform(X_reduced)
showdigit(X_redu)
我收到错误消息:“ ValueError:输出尺寸必须小于或等于输入尺寸”。。
例如,如果我将n_components = 0.9更改为,则“ ValueError:无法创建intent(cache | hide)|可选数组-必须具有已定义的尺寸,但是得到(28,0,)” < / strong>
您能告诉我们这里出什么问题吗?
Ps。这是代码的第一个单元格(取自hanson-ml):
import numpy as np
import os
def sort_by_target(mnist):
reorder_train = np.array(sorted([(target, i) for i, target in enumerate(mnist.target[:60000])]))[:, 1]
reorder_test = np.array(sorted([(target, i) for i, target in enumerate(mnist.target[60000:])]))[:, 1]
mnist.data[:60000] = mnist.data[reorder_train]
mnist.target[:60000] = mnist.target[reorder_train]
mnist.data[60000:] = mnist.data[reorder_test + 60000]
mnist.target[60000:] = mnist.target[reorder_test + 60000]
try:
from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', version=1, cache=True)
mnist.target = mnist.target.astype(np.int8) # fetch_openml() returns targets as strings
sort_by_target(mnist) # fetch_openml() returns an unsorted dataset
except ImportError:
from sklearn.datasets import fetch_mldata
mnist = fetch_mldata('MNIST original')
mnist["data"], mnist["target"]
X, y = mnist["data"], mnist["target"]