重新排列R中的横截面时间面板数据

时间:2017-07-01 10:04:47

标签: r panel

我的数据采用以下格式:

id profit2017 profit2016 profit2015 profit2014
1           2          3          6          7 
2           4          1          8          3 

我想重新安排以下格式

id Year Profit
1  2017      2
1  2016      3
1  2015      6
1  2014      7
2  2017      4
2  2016      1
2  2015      8
2  2014      3

我不知道怎么开始。关于我可以查看的图书馆或有用资料的任何建议吗?

2 个答案:

答案 0 :(得分:2)

这是从宽格式转换为长格式的经典问题 有一个很好的教程here 下面是基于gather包的tidyr函数的解决方案。

dts_wide <- read.table(header=T, text ='
id profit2017 profit2016 profit2015 profit2014
1           2          3          6          7 
2           4          1          8          3 
')

library(tidyr)
names(dts_wide)[-1] <- as.character(2017:2014)

dts_long <- gather(data=dts_wide, key=Year, value=Profit, 2:5, factor_key=TRUE)
dts_long$Year <- as.numeric(as.character(dts_long$Year))
dts_long

#   id Year Profit
# 1  1 2017      2
# 2  2 2017      4
# 3  1 2016      3
# 4  2 2016      1
# 5  1 2015      6
# 6  2 2015      8
# 7  1 2014      7
# 8  2 2014      3

修改
如果有两组列(profitrevenue),则可能的(不优雅)解决方案是:

dts_wide <- read.table(header=T, text ='
id profit2017 profit2016 profit2015 profit2014 revenue2017 revenue2016 revenue2015 revenue2014
1           2          3          6          7         21          31          61          71 
2           4          1          8          3         22          32          62          72
')

library(tidyr)
library(stringr)

dts_long <- dts_wide %>% gather(key=Year, value=Profit, 2:9, factor_key=TRUE) 
dts_long$key_tmp <- str_sub(dts_long$Year,1,-5)
dts_long$Year <- as.numeric(str_sub(dts_long$Year,-4,-1))
( dts_long <- dts_long %>% spread(key_tmp, Profit) )

#   id Year profit revenue
# 1  1 2014      7      71
# 2  1 2015      6      61
# 3  1 2016      3      31
# 4  1 2017      2      21
# 5  2 2014      3      72
# 6  2 2015      8      62
# 7  2 2016      1      32
# 8  2 2017      4      22

答案 1 :(得分:2)

您还可以使用melt包中的reshape2

df <- read.table(header = TRUE, 
text = "id profit2017 profit2016 profit2015 profit2014
1           2          3          6          7 
2           4          1          8          3" )

library(reshape2)

# drop the pattern "profit" from the column names
names(df) <- sub(pattern = "profit", replacement = "", names(df))
# go to long format with "id" as id.var, the rest are measure.vars
melt(df, id.vars = "id")