我使用以下代码对虹膜数据进行主成分分析:
from sklearn import datasets
iris = datasets.load_iris()
dat = pd.DataFrame(data=iris.data, columns=['sl', 'sw', 'pl', 'pw'])
from sklearn.preprocessing import scale
stddat = scale(dat)
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
pc_out = pca.fit_transform(stddat)
pcdf = pd.DataFrame(data = pc_out , columns = ['PC-1', 'PC-2'])
print(pcdf.head())
输出:
PC-1 PC-2
0 -2.264542 0.505704
1 -2.086426 -0.655405
2 -2.367950 -0.318477
3 -2.304197 -0.575368
4 -2.388777 0.674767
现在我想确定一组新的'sl','sw','pl'和'pw'值的PC-1,比如:4.8,3.1,1.3,0.2。我怎样才能做到这一点?使用sklearn库我无法找到任何方法。
编辑:如评论中所述,我可以使用命令pca.transform(new_data)
获取新数据的PC值。但是,我有兴趣获取变量loadings
,以便我可以使用这些数字来确定以后和任何地方的PC值,而不仅仅是在当前环境中。
按loadings
我的意思是“每个标准化原始变量应该乘以得到组件得分的权重”(来自https://en.wikipedia.org/wiki/Principal_component_analysis)。我在文档页面上找不到执行此操作的方法:http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
答案 0 :(得分:0)
以下是可用的transform
函数here:
def transform(self, X):
"""Apply dimensionality reduction to X.
X is projected on the first principal components previously extracted
from a training set.
Parameters
----------
X : array-like, shape (n_samples, n_features)
New data, where n_samples is the number of samples
and n_features is the number of features.
Returns
-------
X_new : array-like, shape (n_samples, n_components)
Examples
--------
>>> import numpy as np
>>> from sklearn.decomposition import IncrementalPCA
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> ipca = IncrementalPCA(n_components=2, batch_size=3)
>>> ipca.fit(X)
IncrementalPCA(batch_size=3, copy=True, n_components=2, whiten=False)
>>> ipca.transform(X) # doctest: +SKIP
"""
check_is_fitted(self, ['mean_', 'components_'], all_or_any=all)
X = check_array(X)
if self.mean_ is not None:
X = X - self.mean_
X_transformed = np.dot(X, self.components_.T)
if self.whiten:
X_transformed /= np.sqrt(self.explained_variance_)
return X_transformed
变量加载是您从pca.components_
获得的组件。请确保mean_
0
和whiten
为False
,然后您只需获取该矩阵并在任何想要转换矩阵/向量的位置使用它。 / p>