使用Hotelling的T2获得置信区间的Python PCA图

时间:2017-10-13 14:21:46

标签: python python-3.x statistics pca confidence-interval

我正在尝试将PCA应用于多变量分析,并绘制前两个组件的得分图,其中在Hotthon中使用Hotelling T2置信椭圆。我能够得到散点图,我想在散点图中添加95%置信度椭圆。如果有人知道如何在python中完成它会很棒。

预期产出的样本图片:

Scatter plot of two principal components

1 个答案:

答案 0 :(得分:0)

pca库提供Hotelling T2和SPE / DmodX离群值检测。

pip install pca

from pca import pca
import pandas as pd
import numpy as np

# Create dataset with 100 samples
X = np.array(np.random.normal(0, 1, 500)).reshape(100, 5)
# Create 5 outliers
outliers = np.array(np.random.uniform(5, 10, 25)).reshape(5, 5)
# Combine data
X = np.vstack((X, outliers))

# Initialize model. Alpha is the threshold for the hotellings T2 test to determine outliers in the data.
model = pca(alpha=0.05)

# Fit transform
out = model.fit_transform(X)

使用

打印离群值
print(out['outliers'])

#            y_proba      y_score  y_bool  y_bool_spe  y_score_spe
# 1.0   9.799576e-01     3.060765   False       False     0.993407
# 1.0   8.198524e-01     5.945125   False       False     2.331705
# 1.0   9.793117e-01     3.086609   False       False     0.128518
# 1.0   9.743937e-01     3.268052   False       False     0.794845
# 1.0   8.333778e-01     5.780220   False       False     1.523642
# ..             ...          ...     ...         ...          ...
# 1.0   6.793085e-11    69.039523    True        True    14.672828
# 1.0  2.610920e-291  1384.158189    True        True    16.566568
# 1.0   6.866703e-11    69.015237    True        True    14.936442
# 1.0  1.765139e-292  1389.577522    True        True    17.183093
# 1.0  1.351102e-291  1385.483398    True        True    17.319038

进行情节

model.biplot(legend=True, SPE=True, hotellingt2=True)

pca biplot with outliers