在R中的面板数据的标准偏差之间和之内

时间:2015-03-12 14:13:57

标签: r

我想使用R来确定面板数据的内部,整体和标准差之间。我发现了这个非常相似的问题Between/within standard deviations in R,但我不知道如何应用解决方案我的数据。

让我们使用以下数据作为示例:

library(foreign)
Panel <-read.dta("http://dss.princeton.edu/training/Panel101.dta")

给出以下输出:

   country year           y y_bin          x1         x2          x3   opinion
1        A 1990  1342787840     1  0.27790365 -1.1079559  0.28255358 Str agree
2        A 1991 -1899660544     0  0.32068470 -0.9487200  0.49253848     Disag
3        A 1992   -11234363     0  0.36346573 -0.7894840  0.70252335     Disag
4        A 1993  2645775360     1  0.24614404 -0.8855330 -0.09439092     Disag
5        A 1994  3008334848     1  0.42462304 -0.7297683  0.94613063     Disag
6        A 1995  3229574144     1  0.47721413 -0.7232460  1.02968037 Str agree
7        A 1996  2756754176     1  0.49980500 -0.7815716  1.09228814     Disag
8        A 1997  2771810560     1  0.05162839 -0.7048455  1.41590083 Str agree
9        A 1998  3397338880     1  0.36641079 -0.6983712  1.54872274     Disag
10       A 1999    39770336     1  0.39584252 -0.6431540  1.79419804 Str disag
11       B 1990 -5934699520     0 -0.08184998  1.4251202  0.02342812     Agree
12       B 1991  -711623744     0  0.10616001  1.6496018  0.26036251 Str agree
13       B 1992 -1933116160     0  0.35378519  1.5937191 -0.23439877     Agree
14       B 1993  3072741632     1  0.72677696  1.6917576  0.25622433 Str disag
15       B 1994  3768078848     1  0.71939486  1.7414261  0.41174951     Disag
16       B 1995  2837581312     1  0.67154658  1.7083139  0.53584301 Str disag
17       B 1996   577199360     1  0.81985730  1.5324961 -0.49964902 Str agree
18       B 1997  1786851584     1  0.88016719  1.5021962 -0.57626772     Disag
19       B 1998  -149072048     0  0.70451611  1.4236463 -0.44841924     Agree
20       B 1999 -1174480128     0  0.23696731  1.4545859 -0.04936399 Str disag
21       C 1990 -1292379264     0  1.31256068 -1.2931356  0.20408297     Agree
22       C 1991 -3415966464     0  1.17748356 -1.3442180  0.28397188 Str agree
23       C 1992  -355804672     0  1.25640798 -1.2599510  0.37339270     Agree
24       C 1993  1225180032     1  1.42154455 -1.3117452 -0.37596563     Disag
25       C 1994  3802287616     1  1.11419308 -1.2849948  0.56046754 Str disag
26       C 1995  1959696640     1  1.15948391 -1.2188276  0.69540799     Agree
27       C 1996   530576672     1  1.16045427 -1.2350063  0.81689382     Agree
28       C 1997  3128852224     1  1.44641161 -1.3275964 -0.14206907 Str disag
29       C 1998  3201045760     1  1.15162671 -1.2061129  1.19458139 Str agree
30       C 1999  4663067648     1  1.19054413 -1.1266172  1.67016041     Disag
31       D 1990  1883025152     1 -0.31391269  1.7366557  0.64663702     Disag
32       D 1991  6037768704     1  0.36009100  2.1318641  1.09994173     Disag
33       D 1992    10244189     1  0.05188770  1.6816775  0.96976823 Str agree
34       D 1993  5067265024     1  0.20944354  1.6149769 -0.21257821 Str agree
35       D 1994  3882478336     1  0.38207000  1.5683011 -1.16538668     Disag
36       D 1995  8827006976     1  0.24208580  1.5412215 -0.18413101     Agree
37       D 1996  5782000128     1  0.48636678  1.7423391 -0.03731453 Str disag
38       D 1997  5090524160     1  0.35942599  1.8742865  0.08786795 Str agree
39       D 1998  1850565248     1  0.23220351  1.5953021  0.07247547     Disag
40       D 1999 -2025476864     0 -0.07998896  1.7047973  0.55843300 Str agree
41       E 1990  1342787840     1  0.45286715  1.7284026  0.59705788 Str disag
42       E 1991  2296009472     1  0.41904032  1.7068400  0.79313534 Str agree
43       E 1992  1737627776     1  0.38521346  1.6852775  0.98921281     Agree
44       E 1993   113973136     1 -0.24428773  1.6492835  1.22413278 Str agree
45       E 1994   260098048     1  1.39113998  2.5302765 -0.52620137 Str disag
46       E 1995 -7863482880     0  0.31968558  1.1890552 -0.48425370     Agree
47       E 1996  3520491520     1  0.61097682  1.4845277 -0.97895509     Agree
48       E 1997  5234565120     1  0.71761495  1.5544620 -0.98863661 Str disag
49       E 1998   344746176     1  0.69613826  1.7010406 -0.08965246     Disag
50       E 1999   243920688     1  0.60662067  1.6119040 -0.08929884 Str disag
51       F 1990  1342787840     1 -0.56757486 -0.3466710  1.25841928 Str agree
52       F 1991  3560401920     1  0.15974578 -0.4641182  0.32665297 Str disag
53       F 1992  3192281088     1  0.88706642 -0.5815655 -0.60511333     Agree
54       F 1993  8941232128     1  0.53241795 -0.7553238 -0.51157588     Agree
55       F 1994  8124504576     1  0.87260014 -0.7114431  0.20570269 Str agree
56       F 1995   491740096     1  0.91935229 -0.3697441 -0.01292755 Str agree
57       F 1996  3497164544     1  1.39689231 -0.3601406  0.67867643 Str agree
58       F 1997  4764803072     1  0.98688608 -0.3590902  0.24226174 Str agree
59       F 1998 -4671723520     0  0.78830910 -0.7556524  0.73347801     Agree
60       F 1999  6349319168     1  0.27938697 -0.4601679  1.17317200     Disag
61       G 1990  1342787840     1  0.94488174 -1.5150151  1.45265734 Str disag
62       G 1991 -1518985728     0  1.09872830 -1.4614717  1.43964469     Agree
63       G 1992  1912769920     1  1.25257492 -1.4079282  1.42663205 Str agree
64       G 1993  1345690240     1  0.76276451 -1.3519315  1.85448635 Str disag
65       G 1994  2793515008     1  1.20645559 -1.3252175  2.23653030 Str disag
66       G 1995  1323696384     1  1.08718646 -1.4098167  2.82980847 Str disag
67       G 1996   254524176     1  0.78107548 -1.3279996  4.27822399 Str agree
68       G 1997  3297033216     1  1.25787950 -1.5773667  4.58732557     Disag
69       G 1998  3011820800     1  1.24277663 -1.6012177  6.11376190     Disag
70       G 1999  3296283392     1  1.23420024 -1.6217614  7.16892195     Disag

St.Dev内。应记录多年来一个国家内的差异。而St.Dev之间。应捕捉各国之间的差异。因此,对于每个变量,输出应该是3个不同的标准偏差(在内部,之间和整体)(这里:x1,x2,x3)。 PS:我也在使用plm和reshape2包。

编辑:在第二步中,我按

计算每个国家的平均值
Panel_mean <- Panel %>% group_by(country) %>% summarise(mean(x1), mean(x2), mean(x3))

通过以下方式获取国家/地区之间的差异:

Panel %>% group_by(country) %>% summarise_each(funs(mean), x1, x2, x3) %>% 
summarise_each(funs(var), x1, x2, x3)

以及年间的差异:

Panel %>% group_by(year) %>% summarise_each(funs(mean), x1, x2, x3) %>% 
summarise_each(funs(var), x1, x2, x3)

编辑2:因为有问题,这是我接下来的步骤:我想确定特定国家/地区的回归量,以绘制y和每个回归量之间的无条件相关性。我想为每个变量获得3个“组”图:  1.整体相关性  2. y和回归量与其国家的偏差意味着(在差异范围内)  3.区域变量均值的相关性(方差之间)

以下是所需输出的示例: enter image description here

对于整体相关性,我想我可以简单地使用lm(而不是用于面板数据分析的plm),如:

plot(x1, y)
abline(lm(y~x1)

或者我完全走错了路?

2 个答案:

答案 0 :(得分:1)

您可以使用dplyr

执行此操作
# The within-country variance:
df %>% group_by(country) %>% summarise(var(x1), var(x2), var(x3))
Source: local data frame [7 x 4]

  country    var(x1)     var(x2)   var(x3)
1       A 0.01689254 0.019945743 0.3459071
2       B 0.11111015 0.014658133 0.1578417
3       C 0.01376573 0.004341126 0.3684358
4       D 0.05922682 0.030828768 0.4438790
5       E 0.16660745 0.114101310 0.6562002
6       F 0.30408784 0.029109927 0.3974615
7       G 0.03731913 0.012823557 4.3677278

# The within-year variance:
df %>% group_by(year) %>% summarise(var(x1), var(x2), var(x3))
Source: local data frame [10 x 4]

   year   var(x1)  var(x2)   var(x3)
1  1990 0.4565977 2.215550 0.2904437
2  1991 0.1906246 2.501216 0.2097600
3  1992 0.2307872 2.103001 0.5223656
4  1993 0.2783625 2.172129 0.8009998
5  1994 0.1505808 2.647259 1.1734290
6  1995 0.1356406 1.794507 1.2216286
7  1996 0.1179536 1.909766 2.9574045
8  1997 0.2380631 2.155005 3.5644637
9  1998 0.1375272 2.085431 5.0101764
10 1999 0.2455796 2.004060 6.2910426

# And the overall variance:

 apply(df[5:7], 2, var)
       x1        x2        x3 
0.2190896 1.8799138 2.0918771 

答案 1 :(得分:1)

您可以使用这些结果进行大量计算,问题是,它对您的目的有用吗?你的分析的目标是什么,你想用什么回答呢?