现在我有3个国家的时间序列GDP数据。我想为数据集创建一个面板,以进行进一步的面板分析。我不知道如何在重塑包装时创建它。
AT CZ DE
1995 68410.7 30457.3 630631.5
1995.25 68353.5 30213.1 625515.3
1995.5 68103.3 29766.4 623124.0
1995.75 67896.0 29661.8 621122.0
1996 67888.8 29595.8 616673.1
1996.25 67874.5 29880.0 616645.4
我发现我可以这样重塑数据:
long <- reshape(as.data.frame(GDP.series),varying = list(names(GDP.series)), v.names="GDP",
timevar = "Country", idvar = "time", ids = row.names(GDP.series),
times = names(GDP.series), new.row.names = 1:((dim(GDP.series)[2])*(dim(GDP.series)[1])),direction = "long")
之后数据如下:
Country GDP
1 AT 49149.0
2 AT 49555.5
3 AT 49475.9
4 AT 49507.6
5 AT 49888.9
6 AT 50324.5
但这种转变的问题是关于时间段的信息丢失了。我是初学者,并不是所有代码背后都是我的理解,尤其是这一部分:
"new.row.names = 1:((dim(GDP.series)[2])*(dim(GDP.series)[1])),direction = "long""
所以我知道我的问题是如何改进/更改代码,以防数据格式如下:
Country GDP
2013 AT 49149.0
2012.75 AT 49555.5
2012.5 AT 49475.9
2012.25 AT 49507.6
2011 AT 49888.9
2011.75 AT 50324.5
或者我是否需要使用其他功能?先感谢您。 (代码取自此主题:Data Transformation in R for Panel Regression)
答案 0 :(得分:0)
这将回答你的!但请记住,数据框的rownames必须是唯一的,所以你不能拥有它。检查我的输出:
data = data.frame(AT = 1:6,CZ = 11:16,DE = 21:26)
rownames(data) = c(2013,2012.75, 2012.5 ,2012.25 ,2011,2011.75)
data$row = rownames(data)
library(reshape2)
data1 = melt(data, id.vars = "row", measure.vars = c("AT","CZ","DE"),
value.name = "GDP", variable.name = "Country")
data1
row Country GDP
1 2013 AT 1
2 2012.75 AT 2
3 2012.5 AT 3
4 2012.25 AT 4
5 2011 AT 5
6 2011.75 AT 6
7 2013 CZ 11
8 2012.75 CZ 12
9 2012.5 CZ 13
10 2012.25 CZ 14
11 2011 CZ 15
12 2011.75 CZ 16
13 2013 DE 21
14 2012.75 DE 22
15 2012.5 DE 23
16 2012.25 DE 24
17 2011 DE 25
18 2011.75 DE 26
如果你想要的是一个国家的数据帧列表,那么使用dlply():
library(plyr)
dlply(data1, .(Country), function(x) {rownames(x) = x$row;x$row = NULL;x})
$AT
Country GDP
2013 AT 1
2012.75 AT 2
2012.5 AT 3
2012.25 AT 4
2011 AT 5
2011.75 AT 6
$CZ
Country GDP
2013 CZ 11
2012.75 CZ 12
2012.5 CZ 13
2012.25 CZ 14
2011 CZ 15
2011.75 CZ 16
$DE
Country GDP
2013 DE 21
2012.75 DE 22
2012.5 DE 23
2012.25 DE 24
2011 DE 25
2011.75 DE 26