我想使用R来确定面板数据的内部,整体和标准差之间。我发现了这个非常相似的问题Between/within standard deviations in R,但我不知道如何应用解决方案我的数据。
让我们使用以下数据作为示例:
library(foreign)
Panel <-read.dta("http://dss.princeton.edu/training/Panel101.dta")
给出以下输出:
country year y y_bin x1 x2 x3 opinion
1 A 1990 1342787840 1 0.27790365 -1.1079559 0.28255358 Str agree
2 A 1991 -1899660544 0 0.32068470 -0.9487200 0.49253848 Disag
3 A 1992 -11234363 0 0.36346573 -0.7894840 0.70252335 Disag
4 A 1993 2645775360 1 0.24614404 -0.8855330 -0.09439092 Disag
5 A 1994 3008334848 1 0.42462304 -0.7297683 0.94613063 Disag
6 A 1995 3229574144 1 0.47721413 -0.7232460 1.02968037 Str agree
7 A 1996 2756754176 1 0.49980500 -0.7815716 1.09228814 Disag
8 A 1997 2771810560 1 0.05162839 -0.7048455 1.41590083 Str agree
9 A 1998 3397338880 1 0.36641079 -0.6983712 1.54872274 Disag
10 A 1999 39770336 1 0.39584252 -0.6431540 1.79419804 Str disag
11 B 1990 -5934699520 0 -0.08184998 1.4251202 0.02342812 Agree
12 B 1991 -711623744 0 0.10616001 1.6496018 0.26036251 Str agree
13 B 1992 -1933116160 0 0.35378519 1.5937191 -0.23439877 Agree
14 B 1993 3072741632 1 0.72677696 1.6917576 0.25622433 Str disag
15 B 1994 3768078848 1 0.71939486 1.7414261 0.41174951 Disag
16 B 1995 2837581312 1 0.67154658 1.7083139 0.53584301 Str disag
17 B 1996 577199360 1 0.81985730 1.5324961 -0.49964902 Str agree
18 B 1997 1786851584 1 0.88016719 1.5021962 -0.57626772 Disag
19 B 1998 -149072048 0 0.70451611 1.4236463 -0.44841924 Agree
20 B 1999 -1174480128 0 0.23696731 1.4545859 -0.04936399 Str disag
21 C 1990 -1292379264 0 1.31256068 -1.2931356 0.20408297 Agree
22 C 1991 -3415966464 0 1.17748356 -1.3442180 0.28397188 Str agree
23 C 1992 -355804672 0 1.25640798 -1.2599510 0.37339270 Agree
24 C 1993 1225180032 1 1.42154455 -1.3117452 -0.37596563 Disag
25 C 1994 3802287616 1 1.11419308 -1.2849948 0.56046754 Str disag
26 C 1995 1959696640 1 1.15948391 -1.2188276 0.69540799 Agree
27 C 1996 530576672 1 1.16045427 -1.2350063 0.81689382 Agree
28 C 1997 3128852224 1 1.44641161 -1.3275964 -0.14206907 Str disag
29 C 1998 3201045760 1 1.15162671 -1.2061129 1.19458139 Str agree
30 C 1999 4663067648 1 1.19054413 -1.1266172 1.67016041 Disag
31 D 1990 1883025152 1 -0.31391269 1.7366557 0.64663702 Disag
32 D 1991 6037768704 1 0.36009100 2.1318641 1.09994173 Disag
33 D 1992 10244189 1 0.05188770 1.6816775 0.96976823 Str agree
34 D 1993 5067265024 1 0.20944354 1.6149769 -0.21257821 Str agree
35 D 1994 3882478336 1 0.38207000 1.5683011 -1.16538668 Disag
36 D 1995 8827006976 1 0.24208580 1.5412215 -0.18413101 Agree
37 D 1996 5782000128 1 0.48636678 1.7423391 -0.03731453 Str disag
38 D 1997 5090524160 1 0.35942599 1.8742865 0.08786795 Str agree
39 D 1998 1850565248 1 0.23220351 1.5953021 0.07247547 Disag
40 D 1999 -2025476864 0 -0.07998896 1.7047973 0.55843300 Str agree
41 E 1990 1342787840 1 0.45286715 1.7284026 0.59705788 Str disag
42 E 1991 2296009472 1 0.41904032 1.7068400 0.79313534 Str agree
43 E 1992 1737627776 1 0.38521346 1.6852775 0.98921281 Agree
44 E 1993 113973136 1 -0.24428773 1.6492835 1.22413278 Str agree
45 E 1994 260098048 1 1.39113998 2.5302765 -0.52620137 Str disag
46 E 1995 -7863482880 0 0.31968558 1.1890552 -0.48425370 Agree
47 E 1996 3520491520 1 0.61097682 1.4845277 -0.97895509 Agree
48 E 1997 5234565120 1 0.71761495 1.5544620 -0.98863661 Str disag
49 E 1998 344746176 1 0.69613826 1.7010406 -0.08965246 Disag
50 E 1999 243920688 1 0.60662067 1.6119040 -0.08929884 Str disag
51 F 1990 1342787840 1 -0.56757486 -0.3466710 1.25841928 Str agree
52 F 1991 3560401920 1 0.15974578 -0.4641182 0.32665297 Str disag
53 F 1992 3192281088 1 0.88706642 -0.5815655 -0.60511333 Agree
54 F 1993 8941232128 1 0.53241795 -0.7553238 -0.51157588 Agree
55 F 1994 8124504576 1 0.87260014 -0.7114431 0.20570269 Str agree
56 F 1995 491740096 1 0.91935229 -0.3697441 -0.01292755 Str agree
57 F 1996 3497164544 1 1.39689231 -0.3601406 0.67867643 Str agree
58 F 1997 4764803072 1 0.98688608 -0.3590902 0.24226174 Str agree
59 F 1998 -4671723520 0 0.78830910 -0.7556524 0.73347801 Agree
60 F 1999 6349319168 1 0.27938697 -0.4601679 1.17317200 Disag
61 G 1990 1342787840 1 0.94488174 -1.5150151 1.45265734 Str disag
62 G 1991 -1518985728 0 1.09872830 -1.4614717 1.43964469 Agree
63 G 1992 1912769920 1 1.25257492 -1.4079282 1.42663205 Str agree
64 G 1993 1345690240 1 0.76276451 -1.3519315 1.85448635 Str disag
65 G 1994 2793515008 1 1.20645559 -1.3252175 2.23653030 Str disag
66 G 1995 1323696384 1 1.08718646 -1.4098167 2.82980847 Str disag
67 G 1996 254524176 1 0.78107548 -1.3279996 4.27822399 Str agree
68 G 1997 3297033216 1 1.25787950 -1.5773667 4.58732557 Disag
69 G 1998 3011820800 1 1.24277663 -1.6012177 6.11376190 Disag
70 G 1999 3296283392 1 1.23420024 -1.6217614 7.16892195 Disag
St.Dev内。应记录多年来一个国家内的差异。而St.Dev之间。应捕捉各国之间的差异。因此,对于每个变量,输出应该是3个不同的标准偏差(在内部,之间和整体)(这里:x1,x2,x3)。 PS:我也在使用plm和reshape2包。
编辑:在第二步中,我按
计算每个国家的平均值Panel_mean <- Panel %>% group_by(country) %>% summarise(mean(x1), mean(x2), mean(x3))
通过以下方式获取国家/地区之间的差异:
Panel %>% group_by(country) %>% summarise_each(funs(mean), x1, x2, x3) %>%
summarise_each(funs(var), x1, x2, x3)
以及年间的差异:
Panel %>% group_by(year) %>% summarise_each(funs(mean), x1, x2, x3) %>%
summarise_each(funs(var), x1, x2, x3)
编辑2:因为有问题,这是我接下来的步骤:我想确定特定国家/地区的回归量,以绘制y和每个回归量之间的无条件相关性。我想为每个变量获得3个“组”图: 1.整体相关性 2. y和回归量与其国家的偏差意味着(在差异范围内) 3.区域变量均值的相关性(方差之间)
以下是所需输出的示例:
对于整体相关性,我想我可以简单地使用lm(而不是用于面板数据分析的plm),如:
plot(x1, y)
abline(lm(y~x1)
或者我完全走错了路?
答案 0 :(得分:1)
您可以使用dplyr
:
# The within-country variance:
df %>% group_by(country) %>% summarise(var(x1), var(x2), var(x3))
Source: local data frame [7 x 4]
country var(x1) var(x2) var(x3)
1 A 0.01689254 0.019945743 0.3459071
2 B 0.11111015 0.014658133 0.1578417
3 C 0.01376573 0.004341126 0.3684358
4 D 0.05922682 0.030828768 0.4438790
5 E 0.16660745 0.114101310 0.6562002
6 F 0.30408784 0.029109927 0.3974615
7 G 0.03731913 0.012823557 4.3677278
# The within-year variance:
df %>% group_by(year) %>% summarise(var(x1), var(x2), var(x3))
Source: local data frame [10 x 4]
year var(x1) var(x2) var(x3)
1 1990 0.4565977 2.215550 0.2904437
2 1991 0.1906246 2.501216 0.2097600
3 1992 0.2307872 2.103001 0.5223656
4 1993 0.2783625 2.172129 0.8009998
5 1994 0.1505808 2.647259 1.1734290
6 1995 0.1356406 1.794507 1.2216286
7 1996 0.1179536 1.909766 2.9574045
8 1997 0.2380631 2.155005 3.5644637
9 1998 0.1375272 2.085431 5.0101764
10 1999 0.2455796 2.004060 6.2910426
# And the overall variance:
apply(df[5:7], 2, var)
x1 x2 x3
0.2190896 1.8799138 2.0918771
答案 1 :(得分:1)
您可以使用这些结果进行大量计算,问题是,它对您的目的有用吗?你的分析的目标是什么,你想用什么回答呢?