我有以下数据框:
Variables Varcode Country Ccode 2000 2001
1 Power P France FR 1213 1234
2 Happiness H France FR 1872 2345
3 Power P UK UK 1726 6433
4 Happiness H UK UK 2234 9082
I would like to reshape this dataframe as follows:
Year Country Ccode P(label=Power) H(label=Happiness)
1 2000 France FR 1213 1872
2 2001 France FR 1234 2345
3 2000 UK UK 1726 2234
4 2001 UK UK 6433 9082
原始代码如下:
library(tidyverse)
df %>%
gather(Year, val, -Variables, -Country) %>%
spread(Variables, val)
我尝试扩展代码,因为Ccode
和Indicator Code
最终在列表中排成一行,我决定将代码用作变量名,并将变量名用作标签(请注意,因此,我分别将-Variables
和Variables
交换了-Varcode
和Varcode
):
library(tidyverse)
library(Hmisc)
List <- df$Variables
df<-df %>%
gather(Year, val, -Varcode, -Country) %>%
spread(Varcode, val)
for(i in List){
label(df[,i]) <- List[i]
}
请注意:由于内存限制,我正在使用列表。
我遇到两个问题:
df
中的另外两列(其中Variables
)被添加到了值所在的位置。有人可以帮我弄清楚出什么问题吗?
答案 0 :(得分:1)
我认为您在选择要收集的列时出错了
数据:
df <- read.table(text = "Variables Varcode Country 2000 2001
1 Power P France 1213 1234
2 Happiness H France 1872 2345
3 Power P UK 1726 6433
4 Happiness H UK 2234 9082", header = TRUE, stringsAsFactors = FALSE) %>%
rename(`2000` = X2000, `2001` = X2001)
df %>%
select(-Varcode) %>%
gather(Year, val,`2000`:`2001`) %>%
unite(Country_Ccode, Country, Ccode, sep = "_") %>%
spread(Variables, val) %>%
separate(Country_Ccode, c("Country", "Ccode"), sep = "_")
输出
Country Ccode Year Happiness Power
1 France FR 2000 1872 1213
2 France FR 2001 2345 1234
3 UK UK 2000 2234 1726
4 UK UK 2001 9082 6433