如何使用PCA模型预测Stata中新数据的得分?

时间:2016-11-22 17:36:40

标签: stata pca predict

我的问题类似于R: using predict() on new data with high dimensionality,但对于Stata

我想在一个数据子集(来自实验的控制组)上运行主成分模型(pca)来提取第一个组件。然后我想在一个单独的数据子集(实验的治疗组)上重新运行PCA模型,并获得这些数据的分数。基本上我想使用在dataset_1上运行的pca模型来预测新数据集_2中的分数。

在R中,只有模型适合控制组,然后才能在拟合模型上使用“预测”命令,并在“新数据”参数中设置完整数据。这将仅针对安装在对照组上的模型生成所有观察结果的预测。但是,如何在Stata中做到这一点?

global xlist2a std_agreedisagree1_1_a std_revagreedisagree1_2_a std_revagreedisagree1_3_a std_agreedisagree1_4_a std_revagreedisagree1_10_a std_revagreedisagree1_5_a 
pca $xlist2a
screeplot, yline(1)     
rotate, clear       
pca $xlist2a, com(3) 
rotate, varimax blanks (.30) 
predict pca5_p1b pca5_p2b pca5_p3b, score

基于尼克回答的固定代码:

global xlist2a std_agreedisagree1_1_a std_revagreedisagree1_2_a std_revagreedisagree1_3_a std_agreedisagree1_4_a std_revagreedisagree1_10_a std_revagreedisagree1_5_a 
pca $xlist2a if zgroupa10==1 
screeplot, yline(1)     
rotate, clear       
pca $xlist2a if zgroupa10==1, com(3) 
rotate, varimax blanks (.30) 
predict pca5_p1b pca5_p2b pca5_p3b, score 

1 个答案:

答案 0 :(得分:0)

您尝试了哪些代码?最简单的实验表明,同样的方法也适用于Stata:

. sysuse auto, clear
(1978 Automobile Data)

. pca headroom trunk length displacement if foreign

Principal components/correlation                 Number of obs    =         22
                                                 Number of comp.  =          4
                                                 Trace            =          4
    Rotation: (unrotated = principal)            Rho              =     1.0000

    --------------------------------------------------------------------------
       Component |   Eigenvalue   Difference         Proportion   Cumulative
    -------------+------------------------------------------------------------
           Comp1 |      1.93666      .656823             0.4842       0.4842
           Comp2 |      1.27983      .615381             0.3200       0.8041
           Comp3 |      .664453      .545396             0.1661       0.9702
           Comp4 |      .119057            .             0.0298       1.0000
    --------------------------------------------------------------------------

Principal components (eigenvectors) 

    --------------------------------------------------------------------
        Variable |    Comp1     Comp2     Comp3     Comp4 | Unexplained 
    -------------+----------------------------------------+-------------
        headroom |   0.0288    0.7373    0.6749    0.0083 |           0 
           trunk |   0.2443    0.6496   -0.7199   -0.0090 |           0 
          length |   0.6849   -0.1313    0.1229   -0.7061 |           0 
    displacement |   0.6858   -0.1313    0.1054    0.7080 |           0 
    --------------------------------------------------------------------

. predict score1 score2 if !foreign
(score assumed)
(2 components skipped)

Scoring coefficients 
    sum of squares(column-loading) = 1

    ------------------------------------------------------
        Variable |    Comp1     Comp2     Comp3     Comp4 
    -------------+----------------------------------------
        headroom |   0.0288    0.7373    0.6749    0.0083 
           trunk |   0.2443    0.6496   -0.7199   -0.0090 
          length |   0.6849   -0.1313    0.1229   -0.7061 
    displacement |   0.6858   -0.1313    0.1054    0.7080 
    ------------------------------------------------------