重塑世界银行数据格式

时间:2016-01-19 10:14:26

标签: r reshape

我正在制作一些具有以下结构的世界银行面板数据(NA值):

    df <- read.table(text="
    Indicator Country 1996 1997 1998
    X         A       v1   NA   v3
    X         B       v4   v5   v6
    X         C       NA   v8   v9
    Y         A       z1   NA   z3
    Y         B       NA   NA   z6
    Y         C       z7   z8   z9", header = TRUE)

我希望获得这种结构:

    Country  Year   X    Y
    A        1996   v1   z1
    A        1997   NA   NA
    A        1998   v3   z3
    B        1996   v4   NA
    B        1997   v5   NA
    B        1998   v6   z6
    C        1996   NA   z7
    C        1997   v8   z8
    C        1998   v9   z9

我使用以下代码尝试了Reshaping data.frame from wide to long format中给出的答案:

    df.reshaped=reshape(df, direction="long", varying=list(names(df)[3:5]), 
    v.names=c("X", "Y"), idvar= "Country", times=1996:1998)

但是没有得到我想要的东西。 真正的平面文件包含近20个指标* 214个国家* 35年,所以寻求你的帮助。

2 个答案:

答案 0 :(得分:2)

我们可以使用melt/dcast

library(data.table)
dcast(melt(setDT(df), id.var=c("Indicator", "Country"), 
    variable.name="year"), 
        Country+year~Indicator, value.var='value')
#   Country year  X  Y
#1:       A 1996 v1 z1
#2:       A 1997 NA NA
#3:       A 1998 v3 z3
#4:       B 1996 v4 NA
#5:       B 1997 v5 NA
#6:       B 1998 v6 z6
#7:       C 1996 NA z7
#8:       C 1997 v8 z8
#9:       C 1998 v9 z9

答案 1 :(得分:2)

作为参考,您可以使用reshape + stack的组合在基数R中执行类似的操作:

reshape(cbind(df[c(1, 2)], 
              stack(lapply(df[-c(1, 2)], as.character))), 
        direction = "wide", 
        idvar = c("Country", "ind"), 
        timevar = "Indicator")
#    Country  ind values.X values.Y
# 1        A 1996       v1       z1
# 2        B 1996       v4     <NA>
# 3        C 1996     <NA>       z7
# 7        A 1997     <NA>     <NA>
# 8        B 1997       v5     <NA>
# 9        C 1997       v8       z8
# 13       A 1998       v3       z3
# 14       B 1998       v6       z6
# 15       C 1998       v9       z9

而且,在Hadleyverse®中,gatherspread

library(dplyr)
library(tidyr)
df %>%
  gather(Year, value, -Country, -Indicator) %>%
  spread(Indicator, value)
#   Country Year    X    Y
# 1       A 1996   v1   z1
# 2       A 1997 <NA> <NA>
# 3       A 1998   v3   z3
# 4       B 1996   v4 <NA>
# 5       B 1997   v5 <NA>
# 6       B 1998   v6   z6
# 7       C 1996 <NA>   z7
# 8       C 1997   v8   z8
# 9       C 1998   v9   z9