如何在一年内关联国家?

时间:2016-04-18 11:07:49

标签: correlation stata

我试图找到使用stata的国家之间的GDP之间的相关性。我正在使用Penn World Tables 8.1(http://www.rug.nl/research/ggdc/data/pwt/v81/pwt81.zip)提供的数据。这是一个包含大量宏观统计数据的海量数据表,但基本上我对变量country,year和rgdpna(GDP)感兴趣。

我一直在尝试为我感兴趣的每个国家创建新变量,并尝试使用pwcorr来关联这些变量。然而,这种方法会产生很多缺失的变量并且没有相关性。我的代码是:

/*We generate variables for countries*/

/*We use Sweden as reference point and find 5 near-by countries.
The chosen countries are Sweden, Norway, Finland, Germany, Denmark*/
gen swe = rgdpna if country == "Sweden" & year >= 1997
gen nor = rgdpna if country == "Norway" & year >= 1997
gen fin = rgdpna if country == "Finland" & year >= 1997
gen ger = rgdpna if country == "Germany" & year >= 1997
gen den = rgdpna if country == "Denmark" & year >= 1997

/*Then we choose 5 far-away countries. The chosen countries are
Canada, China, Japan, Russia, US*/
gen can = rgdpna if country == "Canada" & year >= 1997
gen usa = rgdpna if country == "United States" & year >= 1997
gen rus = rgdpna if country == "Russian Federation" & year >= 1997
gen chn = rgdpna if country == "China, People's Republic of" & year >= 1997
gen jap = rgdpna if country == "Japan" & year >= 1997

/*pwcorr the variables*/
pwcorr swe nor fin ger den can usa rus chn

这给出了以下结果:

             |      swe      nor      fin      ger      den      can      usa
-------------+---------------------------------------------------------------
         swe |   1.0000 
         nor |        .   1.0000 
         fin |        .        .   1.0000 
         ger |        .        .        .   1.0000 
         den |        .        .        .        .   1.0000 
         can |        .        .        .        .        .   1.0000 
         usa |        .        .        .        .        .        .   1.0000 
         rus |        .        .        .        .        .        .        . 
         chn |        .        .        .        .        .        .        . 

             |      rus      chn
-------------+------------------
         rus |   1.0000 
         chn |        .   1.0000 

有人知道如何解决这个问题吗?

1 个答案:

答案 0 :(得分:2)

您有一个面板数据结构,因此不同的国家/地区有不同的观察结果。因此,除非您将国家与自身进行比较,否则您不应对结果丢失感到惊讶。你首先需要reshape,这样的事情。

keep if year >= 1997
local c1 inlist(country, "Sweden", "Norway", "Finland", "Germany", "Denmark") 
local c2 inlist(country, "Canada", "United States", "Russian Federation", "China, People's Republic of", "Japan") 
keep if `c1' | `c2' 
separate rgdpna, by(country) veryshortlabel 
drop rgdpna country 
reshape wide rgdpna, i(year) j(country) 
pwcorr rgdpna*