我的数据采用以下格式:
id profit2017 profit2016 profit2015 profit2014
1 2 3 6 7
2 4 1 8 3
我想重新安排以下格式
id Year Profit
1 2017 2
1 2016 3
1 2015 6
1 2014 7
2 2017 4
2 2016 1
2 2015 8
2 2014 3
我不知道怎么开始。关于我可以查看的图书馆或有用资料的任何建议吗?
答案 0 :(得分:2)
这是从宽格式转换为长格式的经典问题
有一个很好的教程here
下面是基于gather
包的tidyr
函数的解决方案。
dts_wide <- read.table(header=T, text ='
id profit2017 profit2016 profit2015 profit2014
1 2 3 6 7
2 4 1 8 3
')
library(tidyr)
names(dts_wide)[-1] <- as.character(2017:2014)
dts_long <- gather(data=dts_wide, key=Year, value=Profit, 2:5, factor_key=TRUE)
dts_long$Year <- as.numeric(as.character(dts_long$Year))
dts_long
# id Year Profit
# 1 1 2017 2
# 2 2 2017 4
# 3 1 2016 3
# 4 2 2016 1
# 5 1 2015 6
# 6 2 2015 8
# 7 1 2014 7
# 8 2 2014 3
修改强>
如果有两组列(profit
和revenue
),则可能的(不优雅)解决方案是:
dts_wide <- read.table(header=T, text ='
id profit2017 profit2016 profit2015 profit2014 revenue2017 revenue2016 revenue2015 revenue2014
1 2 3 6 7 21 31 61 71
2 4 1 8 3 22 32 62 72
')
library(tidyr)
library(stringr)
dts_long <- dts_wide %>% gather(key=Year, value=Profit, 2:9, factor_key=TRUE)
dts_long$key_tmp <- str_sub(dts_long$Year,1,-5)
dts_long$Year <- as.numeric(str_sub(dts_long$Year,-4,-1))
( dts_long <- dts_long %>% spread(key_tmp, Profit) )
# id Year profit revenue
# 1 1 2014 7 71
# 2 1 2015 6 61
# 3 1 2016 3 31
# 4 1 2017 2 21
# 5 2 2014 3 72
# 6 2 2015 8 62
# 7 2 2016 1 32
# 8 2 2017 4 22
答案 1 :(得分:2)
您还可以使用melt
包中的reshape2
:
df <- read.table(header = TRUE,
text = "id profit2017 profit2016 profit2015 profit2014
1 2 3 6 7
2 4 1 8 3" )
library(reshape2)
# drop the pattern "profit" from the column names
names(df) <- sub(pattern = "profit", replacement = "", names(df))
# go to long format with "id" as id.var, the rest are measure.vars
melt(df, id.vars = "id")