我正在尝试重塑大型数据集,并且无法按照我想要的顺序获得正确的结果。
以下是数据的样子:
GeoFIPS GeoName IndustryID Description X2001 X2002 X2003 X2004 X2005
10180 Abilene, TX 21 Mining 96002 92407 127138 150449 202926
10180 Abilene, TX 22 Utilities 33588 34116 33105 33265 32452
...
数据框很长,包括美国所有选定行业的MSA。
我希望它看起来像这样:
GeoFIPS GeoName Year Mining Utilities (etc)
10180 Abilene, TX 2001 96002 33588
10180 Abilene, TX 2002 92407 34116
....
我对R很新,非常感谢你的帮助。 我已经检查了广泛到长,长到宽,但这似乎是一个更复杂的情况。 谢谢!
编辑: 数据
df1 <- structure(list(GeoFIPS = c(10180L, 10180L), GeoName =
c("Abilene, TX",
"Abilene, TX"), IndustryID = 21:22, Description = c("Mining",
"Utilities"), X2001 = c(96002L, 33588L), X2002 = c(92407L, 34116L
), X2003 = c(127138L, 33105L), X2004 = c(150449L, 33265L), X2005 =
c(202926L,
32452L)), .Names = c("GeoFIPS", "GeoName", "IndustryID", "Description",
"X2001", "X2002", "X2003", "X2004", "X2005"), class = "data.frame",
row.names = c(NA, -2L))
答案 0 :(得分:2)
您可以使用melt/dcast
reshape2
library(reshape2)
df2 <- melt(df1, id.var=c('GeoFIPS', 'GeoName',
'IndustryID', 'Description'))
df2 <- transform(df2, Year=sub('^X', '', variable))[-c(3,5)]
dcast(df2, ...~Description, value.var='value')
# GeoFIPS GeoName Year Mining Utilities
#1 10180 Abilene, TX 2001 96002 33588
#2 10180 Abilene, TX 2002 92407 34116
#3 10180 Abilene, TX 2003 127138 33105
#4 10180 Abilene, TX 2004 150449 33265
#5 10180 Abilene, TX 2005 202926 32452
df1 <- structure(list(GeoFIPS = c(10180L, 10180L), GeoName =
c("Abilene, TX",
"Abilene, TX"), IndustryID = 21:22, Description = c("Mining",
"Utilities"), X2001 = c(96002L, 33588L), X2002 = c(92407L, 34116L
), X2003 = c(127138L, 33105L), X2004 = c(150449L, 33265L), X2005 =
c(202926L,
32452L)), .Names = c("GeoFIPS", "GeoName", "IndustryID", "Description",
"X2001", "X2002", "X2003", "X2004", "X2005"), class = "data.frame",
row.names = c(NA, -2L))