我有两个变量组:年百分比和年。年度百分比从1999年开始,到2012年结束,但是年份从1999年到2013年开始。
countrylabel annualpercentageshare.1999 year1990 year1991 year1992
1 Austria NA NA NA NA
2 Belgium NA NA NA NA
3 Bulgaria 48.20000 NA NA NA
4 Estonia NA NA NA NA
5 France 47.52853 NA NA NA
6 Germany NA NA NA NA
这样的事情。
我已经尝试了以下代码:
merge_data2 <- reshape(merge_data2, varying = list(2:ncol(merge_data2)),
v.names = c("percentageshare", "Year"),
idvar = "countrylabel", direction = "long", times = 1990:2013)
但我收到此错误消息:
“ reshapeLong(数据,idvar = idvar,timevar = timevar,变化=变化,错误: 'lengths(varying)'必须全部匹配'length(times)'“
编辑:我想要一个这样的数据框:
countrylabel time annualpercentageshare year
Austria 1990 NA NA
Austria 1991 NA NA
答案 0 :(得分:0)
library(tidyr); library(dplyr)
df %>%
gather(variable, value, -countrylabel) %>%
separate("variable", into = c("stat", "time"), sep = -4) %>%
spread(stat, value)
输出
countrylabel time annualpercentageshare. year
1 Austria 1990 NA NA
2 Austria 1991 NA NA
3 Austria 1992 NA NA
4 Austria 1999 NA NA
5 Belgium 1990 NA NA
6 Belgium 1991 NA NA
7 Belgium 1992 NA NA
8 Belgium 1999 NA NA
9 Bulgaria 1990 NA NA
10 Bulgaria 1991 NA NA
11 Bulgaria 1992 NA NA
12 Bulgaria 1999 48.20000 NA
13 Estonia 1990 NA NA
14 Estonia 1991 NA NA
15 Estonia 1992 NA NA
16 Estonia 1999 NA NA
17 France 1990 NA NA
18 France 1991 NA NA
19 France 1992 NA NA
20 France 1999 47.52853 NA
21 Germany 1990 NA NA
22 Germany 1991 NA NA
23 Germany 1992 NA NA
24 Germany 1999 NA NA
答案 1 :(得分:0)
reshape
喜欢"."
,所以我们首先将一个插入year*
变量中。
names(d) <- gsub("year", "year.", names(d))
现在,我们给reshape
缺少的列和order
,
d$annualpercentage.2002 <- NA
d$year.1999 <- NA
d <- d[c(1, order(names(d)[-1]) + 1)]
您的想法通过定义列表中varying
中不同的列排序而起作用:
res <- reshape(d, varying=list(2:5, 6:9), direction="long", idvar="countrylabel",
times=1999:2002, v.names=c("annualpercentage", "year"))
res
# countrylabel time annualpercentage year
# Austria.1999 Austria 1999 NA NA
# Belgium.1999 Belgium 1999 NA NA
# Bulgaria.1999 Bulgaria 1999 -0.6806495 NA
# Estonia.1999 Estonia 1999 NA NA
# France.1999 France 1999 NA NA
# Germany.1999 Germany 1999 NA NA
# Switzerland.1999 Switzerland 1999 -1.8497570 NA
# Austria.2000 Austria 2000 -0.6033900 0.14970015
# Belgium.2000 Belgium 2000 NA -0.49201756
# Bulgaria.2000 Bulgaria 2000 0.8263925 -0.36320990
# Estonia.2000 Estonia 2000 NA -2.51032544
# France.2000 France 2000 NA 0.57800624
# Germany.2000 Germany 2000 NA -0.52295712
# Switzerland.2000 Switzerland 2000 0.2783076 0.25616728
# Austria.2001 Austria 2001 -2.6962484 -0.15375642
# Belgium.2001 Belgium 2001 1.3088577 0.72528621
# Bulgaria.2001 Bulgaria 2001 NA NA
# Estonia.2001 Estonia 2001 NA -0.05563662
# France.2001 France 2001 0.2224629 0.74205086
# Germany.2001 Germany 2001 NA -0.01185349
# Switzerland.2001 Switzerland 2001 0.8354322 -1.40826638
# Austria.2002 Austria 2002 NA NA
# Belgium.2002 Belgium 2002 NA 1.60874778
# Bulgaria.2002 Bulgaria 2002 NA NA
# Estonia.2002 Estonia 2002 NA 0.55866704
# France.2002 France 2002 NA -1.59866472
# Germany.2002 Germany 2002 NA -0.11217415
# Switzerland.2002 Switzerland 2002 NA NA
数据
d <- structure(list(countrylabel = c("Austria", "Belgium", "Bulgaria",
"Estonia", "France", "Germany", "Switzerland"), annualpercentage.1999 = c(NA,
-2.58060150400384, -0.0623757258909573, 0.267776001395166, NA,
NA, 0.048219924249952), annualpercentage.2000 = c(NA, -0.249416955035044,
1.3525450891501, 1.04446768824697, NA, -0.0582347596434839, -0.891400228849837
), annualpercentage.2001 = c(1.82469277697851, NA, NA, 1.04231605324821,
NA, -0.900145118946308, -1.19320727433597), year2000 = c(0.633712375393134,
NA, 1.24760861316098, -0.092964787061478, -0.59403260962332,
NA, -0.650348234181285), year2001 = c(0.587318286831079, NA,
NA, 0.348890470222513, NA, NA, NA), year2002 = c(0.0645316087966406,
-0.279456557428068, NA, NA, -0.0627400036074545, 1.30419117694731,
-0.484654596062051)), row.names = c(NA, -7L), class = "data.frame")