Question

我的数据采用以下格式：

id profit2017 profit2016 profit2015 profit2014
1           2          3          6          7 
2           4          1          8          3

我想重新安排以下格式

id Year Profit
1  2017      2
1  2016      3
1  2015      6
1  2014      7
2  2017      4
2  2016      1
2  2015      8
2  2014      3

我不知道怎么开始。关于我可以查看的图书馆或有用资料的任何建议吗？

Answer 1

这是从宽格式转换为长格式的经典问题有一个很好的教程here 下面是基于gather包的tidyr函数的解决方案。

dts_wide <- read.table(header=T, text ='
id profit2017 profit2016 profit2015 profit2014
1           2          3          6          7 
2           4          1          8          3 
')

library(tidyr)
names(dts_wide)[-1] <- as.character(2017:2014)

dts_long <- gather(data=dts_wide, key=Year, value=Profit, 2:5, factor_key=TRUE)
dts_long$Year <- as.numeric(as.character(dts_long$Year))
dts_long

#   id Year Profit
# 1  1 2017      2
# 2  2 2017      4
# 3  1 2016      3
# 4  2 2016      1
# 5  1 2015      6
# 6  2 2015      8
# 7  1 2014      7
# 8  2 2014      3

修改
如果有两组列（profit和revenue），则可能的（不优雅）解决方案是：

dts_wide <- read.table(header=T, text =' id profit2017 profit2016 profit2015 profit2014 revenue2017 revenue2016 revenue2015 revenue2014 1 2 3 6 7 21 31 61 71 2 4 1 8 3 22 32 62 72 ') library(tidyr) library(stringr) dts_long <- dts_wide %>% gather(key=Year, value=Profit, 2:9, factor_key=TRUE) dts_long$key_tmp <- str_sub(dts_long$Year,1,-5) dts_long$Year <- as.numeric(str_sub(dts_long$Year,-4,-1)) ( dts_long <- dts_long %>% spread(key_tmp, Profit) ) # id Year profit revenue # 1 1 2014 7 71 # 2 1 2015 6 61 # 3 1 2016 3 31 # 4 1 2017 2 21 # 5 2 2014 3 72 # 6 2 2015 8 62 # 7 2 2016 1 32 # 8 2 2017 4 22

Answer 2

您还可以使用melt包中的reshape2：

df <- read.table(header = TRUE, 
text = "id profit2017 profit2016 profit2015 profit2014
1           2          3          6          7 
2           4          1          8          3" )

library(reshape2)

# drop the pattern "profit" from the column names
names(df) <- sub(pattern = "profit", replacement = "", names(df))
# go to long format with "id" as id.var, the rest are measure.vars
melt(df, id.vars = "id")

重新排列R中的横截面时间面板数据

2 个答案: