我的data.frame最初来自txt文件。它的形式相当不方便,每年在列上分配观察结果。我在分析中需要作为回归量的实际变量被分配到一列作为因子。所以我需要对这个data.frame进行以下转换:
VAR YEAR.1 YEAR.2 YEAR.3
FIRM.1 VAR.1 FV_11.1 FV_11.2 FV_11.3
FIRM.1 VAR.2 FV_12.1 FV_12.2 FV_12.3
FIRM.2 VAR.1 FV_21.1 FV_21.2 FV_21.3
FIRM.2 VAR.2 FV_22.1 FV_22.2 FV_22.3
其中FV_ij.k是在k年观察到公司i变量j。最终的data.frame理想情况如下:
YEAR VAR.1 VAR.2
FIRM.1 YEAR.1 FV_11.1 FV_12.1
FIRM.1 YEAR.2 FV_11.2 FV_12.2
FIRM.1 YEAR.3 FV_11.3 FV_12.3
FIRM.2 YEAR.1 FV_21.1 FV_22.1
FIRM.2 YEAR.2 FV_21.2 FV_22.2
FIRM.2 YEAR.3 FV_21.3 FV_22.3
我知道如何对此进行编码,但这很麻烦。我想知道是否有一些包可以方便地进行这种转换?
答案 0 :(得分:2)
我会建议“reshape2”包中的melt
和dcast
。但首先,这里有一些示例数据:
mydf <- structure(list(FIRM = c("FIRM.1", "FIRM.1", "FIRM.2", "FIRM.2"),
VAR = c("VAR.1", "VAR.2", "VAR.1", "VAR.2"), YEAR.1 = c("FV_11.1",
"FV_12.1", "FV_21.1", "FV_22.1"), YEAR.2 = c("FV_11.2", "FV_12.2",
"FV_21.2", "FV_22.2"), YEAR.3 = c("FV_11.3", "FV_12.3", "FV_21.3",
"FV_22.3")), .Names = c("FIRM", "VAR", "YEAR.1", "YEAR.2", "YEAR.3"),
class = "data.frame", row.names = c(NA, -4L))
mydf
# FIRM VAR YEAR.1 YEAR.2 YEAR.3
# 1 FIRM.1 VAR.1 FV_11.1 FV_11.2 FV_11.3
# 2 FIRM.1 VAR.2 FV_12.1 FV_12.2 FV_12.3
# 3 FIRM.2 VAR.1 FV_21.1 FV_21.2 FV_21.3
# 4 FIRM.2 VAR.2 FV_22.1 FV_22.2 FV_22.3
第1步:将数据转换为长格式。不过,在此之前,请删除“VAR”。来自“VAR”栏
library(reshape2)
mydf$VAR <- gsub("VAR.", "", mydf$VAR)
out <- melt(mydf, id.vars=c("FIRM", "VAR"))
第2步:使用dcast
将数据转换为您想要的格式
dcast(out, FIRM + variable ~ VAR)
# FIRM variable 1 2
# 1 FIRM.1 YEAR.1 FV_11.1 FV_12.1
# 2 FIRM.1 YEAR.2 FV_11.2 FV_12.2
# 3 FIRM.1 YEAR.3 FV_11.3 FV_12.3
# 4 FIRM.2 YEAR.1 FV_21.1 FV_22.1
# 5 FIRM.2 YEAR.2 FV_21.2 FV_22.2
# 6 FIRM.2 YEAR.3 FV_21.3 FV_22.3