格式化R data.frame - 将观察列切换到变量,反之亦然?

时间:2013-10-01 08:34:07

标签: r dataframe reshape

我的data.frame最初来自txt文件。它的形式相当不方便,每年在列上分配观察结果。我在分析中需要作为回归量的实际变量被分配到一列作为因子。所以我需要对这个data.frame进行以下转换:

         VAR    YEAR.1    YEAR.2    YEAR.3
FIRM.1   VAR.1  FV_11.1   FV_11.2   FV_11.3 
FIRM.1   VAR.2  FV_12.1   FV_12.2   FV_12.3
FIRM.2   VAR.1  FV_21.1   FV_21.2   FV_21.3
FIRM.2   VAR.2  FV_22.1   FV_22.2   FV_22.3

其中FV_ij.k是在k年观察到公司i变量j。最终的data.frame理想情况如下:

         YEAR    VAR.1    VAR.2
 FIRM.1  YEAR.1  FV_11.1  FV_12.1
 FIRM.1  YEAR.2  FV_11.2  FV_12.2
 FIRM.1  YEAR.3  FV_11.3  FV_12.3
 FIRM.2  YEAR.1  FV_21.1  FV_22.1
 FIRM.2  YEAR.2  FV_21.2  FV_22.2
 FIRM.2  YEAR.3  FV_21.3  FV_22.3     

我知道如何对此进行编码,但这很麻烦。我想知道是否有一些包可以方便地进行这种转换?

1 个答案:

答案 0 :(得分:2)

我会建议“reshape2”包中的meltdcast。但首先,这里有一些示例数据:

mydf <- structure(list(FIRM = c("FIRM.1", "FIRM.1", "FIRM.2", "FIRM.2"),
    VAR = c("VAR.1", "VAR.2", "VAR.1", "VAR.2"), YEAR.1 = c("FV_11.1",
    "FV_12.1", "FV_21.1", "FV_22.1"), YEAR.2 = c("FV_11.2", "FV_12.2",
    "FV_21.2", "FV_22.2"), YEAR.3 = c("FV_11.3", "FV_12.3", "FV_21.3",
    "FV_22.3")), .Names = c("FIRM", "VAR", "YEAR.1", "YEAR.2", "YEAR.3"),
    class = "data.frame", row.names = c(NA, -4L))
mydf
#     FIRM   VAR  YEAR.1  YEAR.2  YEAR.3
# 1 FIRM.1 VAR.1 FV_11.1 FV_11.2 FV_11.3
# 2 FIRM.1 VAR.2 FV_12.1 FV_12.2 FV_12.3
# 3 FIRM.2 VAR.1 FV_21.1 FV_21.2 FV_21.3
# 4 FIRM.2 VAR.2 FV_22.1 FV_22.2 FV_22.3

第1步:将数据转换为长格式。不过,在此之前,请删除“VAR”。来自“VAR”栏

library(reshape2)
mydf$VAR <- gsub("VAR.", "", mydf$VAR)
out <- melt(mydf, id.vars=c("FIRM", "VAR"))

第2步:使用dcast将数据转换为您想要的格式

dcast(out, FIRM + variable ~ VAR)
#     FIRM variable       1       2
# 1 FIRM.1   YEAR.1 FV_11.1 FV_12.1
# 2 FIRM.1   YEAR.2 FV_11.2 FV_12.2
# 3 FIRM.1   YEAR.3 FV_11.3 FV_12.3
# 4 FIRM.2   YEAR.1 FV_21.1 FV_22.1
# 5 FIRM.2   YEAR.2 FV_21.2 FV_22.2
# 6 FIRM.2   YEAR.3 FV_21.3 FV_22.3