保存和使用PCA的特征向量

时间:2018-08-31 10:42:55

标签: command stata pca

我在Stata中进行了主成分分析(PCA)。

我的数据集包括八个财务指标,这些指标在9个国家/地区有所不同。

例如:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str7 Country double(Investment Profit Income Tax Repayment Leverage Interest Liquidity) int Year
"France"    -.1916055239385184  .046331346724579184  .16438012750896466    .073106839282063 30.373216652548326  4.116650784492168  3.222219873614461  .01453109309122077 2010
"UK"       -.09287803170279468   .10772082765154019  .19475363707485557  .05803923583546618 31.746409646181174  9.669982727208433 1.2958094802269167 .014273374324088752 2010
"US"       -.06262935107629553   .08674901201182428   .1241593221865416  .13387194413811226 25.336612638526013  11.14330064161111  1.954785887176916 .008355601163285917 2010
"Italy"   -.038025847122363045    .1523162032749684  .23885658237030563   .2057478638900476  31.02007902336988 2.9660938817562292   6.12544787693943 .011694993164234125 2010
"Germany"  -.05454795914578491   .06287079763890834  .09347194572148769  .08730237262847926 35.614342337621174  12.03770488195981 1.1958205191308358 .012467084153714813 2010
"Spain "   -.09133982259799572    .1520056836126315  .20905656056324853  .21054797530580743 30.133833346916546 2.0623245902645073  5.122615899157435 .013545432336873187 2010
"Sweden"   -.05403262462960799   .20463787181576967  .22924827352771968  .05655833155565016  20.30540887860061 10.392313613725324  .8634381995636089 .008030624504967313 2010
"Norway "  -.07560184571862992   .08383822093909514  .15469418498932822  .06569716455818478 29.568228705840234 14.383460621594622 1.5561013535825234 .012843159364225464 2010
"Algeria"   -.0494187835163535  .056252436429004446  .09174672864585759  .08143181185307143  34.74103858167055 15.045254276254616 1.2074942921860699 .011578038401820303 2010
"France"   -.03831442432584342   .14722819896988698  .22035417794604084  .12183886462162773  28.44763045286005 12.727100288710087  1.405629911115614 .011186908059399987 2011
"UK"       -.05002189329928202   .16833493262244398   .2288402623558823  .04977050186975224 27.640103129372747  11.17376089844228 1.1764542835994092 .008386726178729322 2011
"US"        -.0871005985124144   .10270482619857023   .1523559355903486  .06775742210623094 26.840586700880362 10.783899184031576  1.454011947763254 .013501919089967212 2011
"Italy"     -.1069324103590126   -.5877872620957578 -.47469302172710803   .2004436360021364 23.133243742952658 5.3936761686065875  4.532771849692548 .012586313916956204 2011
"Germany"  -.05851794344524515   .09960345907923154    .136805115392161   .1373407846168154   32.6182637042919 14.109738344526052 1.5077699357228835 .013200993625042274 2011
"Spain "   -.10650743527105216 -.015785638597076792   .1808727613216441  .05038848927405154  28.22206251292902 10.839614113486853 1.5021425852392374 .012076771099482617 2011
"Sweden"   -.09678946710644694   .11801761803893955  .18569993056826523   .1481844716617448 27.439283362903794  5.771154420635893  5.493437819181101 .013820243145673811 2011
"Norway "  -.04263379351591438   .09931719473864983  .14469611775596314   .0796835513869996  26.68561168581991  14.06385602832082 1.5200488174887825  .01029136242440406 2011
"Algeria"  -.04871983526465598    .2139061303228528   .2728647845448156 .056537570099712456  22.50263575072073 16.919641035094685  .7539881754626142 .009734650338902404 2011
end

轮换后,我将第一个组件称为“负债”,第二个组件称为“获利能力”。

我对于2011、2012、2013、2014等具有相同的数据。我想使用为2010年计算的权重Stata矩阵并将其分别应用于2011、2012、2013。我的目标是比较一段时间内国家之间的债务和获利能力。

为此,我使用了estimate saveestimates use命令(Stata手册第20章有关估计和估计后PCA命令帮助)。

但是,我不明白Stata正在保存什么。是否保存为2010计算出的得分或特征值和特征向量?

这是我使用的代码:

tempfile pca
save `pca'
use `pca' if Year==2010
global xlist Investment Profit Income Tax Repayment Leverage Interest Liquidity
pca $xlist, components(2)
estimates save pcaest, replace
predict score
summarize score
use `pca' if Year==2011, clear
estimates use pcaest
predict score
summarize score
  1. 这种方法和代码对您来说正确吗?

  2. 我还想保存权重矩阵并创建一个新向量Z=b|1,1]*investment+...

1 个答案:

答案 0 :(得分:1)

使用2010年的玩具示例:

clear

input str7 Country double(Investment Profit Income Tax Repayment Leverage Interest Liquidity) int Year
"France"    -.1916055239385184  .046331346724579184  .16438012750896466    .073106839282063 30.373216652548326  4.116650784492168  3.222219873614461  .01453109309122077 2010
"UK"       -.09287803170279468   .10772082765154019  .19475363707485557  .05803923583546618 31.746409646181174  9.669982727208433 1.2958094802269167 .014273374324088752 2010
"US"       -.06262935107629553   .08674901201182428   .1241593221865416  .13387194413811226 25.336612638526013  11.14330064161111  1.954785887176916 .008355601163285917 2010
"Italy"   -.038025847122363045    .1523162032749684  .23885658237030563   .2057478638900476  31.02007902336988 2.9660938817562292   6.12544787693943 .011694993164234125 2010
"Germany"  -.05454795914578491   .06287079763890834  .09347194572148769  .08730237262847926 35.614342337621174  12.03770488195981 1.1958205191308358 .012467084153714813 2010
"Spain "   -.09133982259799572    .1520056836126315  .20905656056324853  .21054797530580743 30.133833346916546 2.0623245902645073  5.122615899157435 .013545432336873187 2010
"Sweden"   -.05403262462960799   .20463787181576967  .22924827352771968  .05655833155565016  20.30540887860061 10.392313613725324  .8634381995636089 .008030624504967313 2010
"Norway "  -.07560184571862992   .08383822093909514  .15469418498932822  .06569716455818478 29.568228705840234 14.383460621594622 1.5561013535825234 .012843159364225464 2010
"Algeria"   -.0494187835163535  .056252436429004446  .09174672864585759  .08143181185307143  34.74103858167055 15.045254276254616 1.2074942921860699 .011578038401820303 2010
end

我得到以下结果:

local xlist Investment Profit Income Tax Repayment Leverage Interest Liquidity
pca `xlist', components(2)

Principal components/correlation                 Number of obs    =          9
                                                 Number of comp.  =          2
                                                 Trace            =          8
    Rotation: (unrotated = principal)            Rho              =     0.7468

    --------------------------------------------------------------------------
       Component |   Eigenvalue   Difference         Proportion   Cumulative
    -------------+------------------------------------------------------------
           Comp1 |      3.43566      .896796             0.4295       0.4295
           Comp2 |      2.53887      1.23215             0.3174       0.7468
           Comp3 |      1.30672      .750756             0.1633       0.9102
           Comp4 |      .555959      .472866             0.0695       0.9797
           Comp5 |     .0830926     .0181769             0.0104       0.9900
           Comp6 |     .0649157     .0526462             0.0081       0.9982
           Comp7 |     .0122695    .00975098             0.0015       0.9997
           Comp8 |    .00251849            .             0.0003       1.0000
    --------------------------------------------------------------------------

Principal components (eigenvectors) 

    ------------------------------------------------
        Variable |    Comp1     Comp2 | Unexplained 
    -------------+--------------------+-------------
      Investment |   0.0004   -0.3837 |       .6262 
          Profit |   0.3896   -0.3794 |       .1131 
          Income |   0.4621   -0.1162 |        .232 
             Tax |   0.4146    0.1236 |       .3706 
       Repayment |  -0.1829    0.4747 |       .3131 
        Leverage |  -0.4685   -0.2596 |      .07464 
        Interest |   0.4580    0.2625 |       .1045 
       Liquidity |  -0.0082    0.5643 |       .1913 
    ------------------------------------------------

要查看pca命令返回的项目类型:

 ereturn list

scalars:
                  e(N) =  9
                  e(f) =  2
                e(rho) =  .7468162625387222
              e(trace) =  8
              e(lndet) =  -13.76082122673546
               e(cond) =  36.93476257313668

macros:
            e(cmdline) : "pca Investment Profit Income Tax Repayment Leverage Interest Liquidity, components(2)"
                e(cmd) : "pca"
              e(title) : "Principal components"
       e(marginsnotok) : "_ALL"
          e(estat_cmd) : "pca_estat"
         e(rotate_cmd) : "pca_rotate"
            e(predict) : "pca_p"
              e(Ctype) : "correlation"
         e(properties) : "nob noV eigen"

matrices:
                e(sds) :  1 x 8
              e(means) :  1 x 8
                  e(C) :  8 x 8
                e(Psi) :  1 x 8
                 e(Ev) :  1 x 8
                  e(L) :  8 x 2

functions:
             e(sample)   

保存返回的包含特征向量的矩阵作为下一年变量的一种方法是创建矩阵的副本并加载2011年数据:

matrix A = e(L)

clear

input str7 Country double(Investment Profit Income Tax Repayment Leverage Interest Liquidity) int Year
"France"   -.03831442432584342   .14722819896988698  .22035417794604084  .12183886462162773  28.44763045286005 12.727100288710087  1.405629911115614 .011186908059399987 2011
"UK"       -.05002189329928202   .16833493262244398   .2288402623558823  .04977050186975224 27.640103129372747  11.17376089844228 1.1764542835994092 .008386726178729322 2011
"US"        -.0871005985124144   .10270482619857023   .1523559355903486  .06775742210623094 26.840586700880362 10.783899184031576  1.454011947763254 .013501919089967212 2011
"Italy"     -.1069324103590126   -.5877872620957578 -.47469302172710803   .2004436360021364 23.133243742952658 5.3936761686065875  4.532771849692548 .012586313916956204 2011
"Germany"  -.05851794344524515   .09960345907923154    .136805115392161   .1373407846168154   32.6182637042919 14.109738344526052 1.5077699357228835 .013200993625042274 2011
"Spain "   -.10650743527105216 -.015785638597076792   .1808727613216441  .05038848927405154  28.22206251292902 10.839614113486853 1.5021425852392374 .012076771099482617 2011
"Sweden"   -.09678946710644694   .11801761803893955  .18569993056826523   .1481844716617448 27.439283362903794  5.771154420635893  5.493437819181101 .013820243145673811 2011
"Norway "  -.04263379351591438   .09931719473864983  .14469611775596314   .0796835513869996  26.68561168581991  14.06385602832082 1.5200488174887825  .01029136242440406 2011
"Algeria"  -.04871983526465598    .2139061303228528   .2728647845448156 .056537570099712456  22.50263575072073 16.919641035094685  .7539881754626142 .009734650338902404 2011
end

然后您可以简单地使用svmat命令:

svmat A

list A* if _n < 9

     +-----------------------+
     |        A1          A2 |
     |-----------------------|
  1. |  .0003921    -.383703 |
  2. |  .3895898   -.3793983 |
  3. |  .4621098   -.1162487 |
  4. |  .4146066    .1235683 |
  5. | -.1828703    .4746658 |
     |-----------------------|
  6. | -.4685374   -.2596268 |
  7. |   .457974    .2624738 |
  8. | -.0081538    .5643047 |
     +-----------------------+

编辑:

根据评论进行了修订:

use X1, clear

local xlist Investment Profit Income Tax Repayment Leverage Interest Liquidity

forvalues i = 1 / 5 {
    pca `xlist' if year == 201`i', components(2)
    matrix A201`i' = e(L)
    svmat A201`i'

    generate B201`i'1 = (A201`i'1 * Investment) + (A201`i'1 * Profit) + ///
                        (A201`i'1 * Income) + (A201`i'1 * Tax) + ///
                        (A201`i'1 * Repayment) + (A201`i'1 * Leverage) + ///
                        (A201`i'1 * Interest) + (A201`i'1 * Liquidity)

    generate B201`i'2 = (A201`i'2 * Investment) + (A201`i'2 * Profit) + ///
                        (A201`i'2 * Income) + (A201`i'2 * Tax) + ///
                        (A201`i'2 * Repayment) + (A201`i'2 * Leverage) + ///
                        (A201`i'2 * Interest) + (A201`i'2 * Liquidity)
}