Question

我正在尝试重塑大型数据集，并且无法按照我想要的顺序获得正确的结果。

以下是数据的样子：

GeoFIPS GeoName IndustryID  Description X2001   X2002   X2003   X2004   X2005 
10180   Abilene, TX     21  Mining      96002   92407   127138 150449   202926
10180   Abilene, TX     22  Utilities   33588   34116   33105   33265   32452
...

数据框很长，包括美国所有选定行业的MSA。

我希望它看起来像这样：

GeoFIPS GeoName        Year     Mining Utilities (etc)
10180   Abilene, TX    2001     96002   33588
10180   Abilene, TX    2002     92407   34116
....

我对R很新，非常感谢你的帮助。我已经检查了广泛到长，长到宽，但这似乎是一个更复杂的情况。谢谢！

编辑：数据

df1 <- structure(list(GeoFIPS = c(10180L, 10180L), GeoName =
c("Abilene, TX", 
"Abilene, TX"), IndustryID = 21:22, Description = c("Mining", 
"Utilities"), X2001 = c(96002L, 33588L), X2002 = c(92407L, 34116L
), X2003 = c(127138L, 33105L), X2004 = c(150449L, 33265L), X2005 = 
c(202926L, 
 32452L)), .Names = c("GeoFIPS", "GeoName", "IndustryID", "Description", 
"X2001", "X2002", "X2003", "X2004", "X2005"), class = "data.frame",
 row.names = c(NA, -2L))

Answer 1

您可以使用melt/dcast

中的reshape2

library(reshape2)
df2 <- melt(df1, id.var=c('GeoFIPS', 'GeoName', 
               'IndustryID', 'Description'))
df2 <- transform(df2, Year=sub('^X', '', variable))[-c(3,5)]


dcast(df2, ...~Description, value.var='value')
#  GeoFIPS     GeoName Year Mining Utilities
#1   10180 Abilene, TX 2001  96002     33588
#2   10180 Abilene, TX 2002  92407     34116
#3   10180 Abilene, TX 2003 127138     33105
#4   10180 Abilene, TX 2004 150449     33265
#5   10180 Abilene, TX 2005 202926     32452

数据

df1 <- structure(list(GeoFIPS = c(10180L, 10180L), GeoName =
c("Abilene, TX", 
"Abilene, TX"), IndustryID = 21:22, Description = c("Mining", 
"Utilities"), X2001 = c(96002L, 33588L), X2002 = c(92407L, 34116L
), X2003 = c(127138L, 33105L), X2004 = c(150449L, 33265L), X2005 = 
c(202926L, 
 32452L)), .Names = c("GeoFIPS", "GeoName", "IndustryID", "Description", 
"X2001", "X2002", "X2003", "X2004", "X2005"), class = "data.frame",
 row.names = c(NA, -2L))

在R中重塑大型数据集

1 个答案:

数据