解析数据框列名称和数据透视数据框R.

时间:2016-02-19 16:51:33

标签: r dataframe

执行以下可重现的示例代码时,

value = c(1:5)
name = c('A','B','C','D','F')
`2015.cost` = c(100,200,250,300,450)
`2016.cost` = c(200,300,350,400,550)
`2017.cost` = c(300,400,450,500,650)
`2015.profit` = c(1000,4200,2450,1500,7650)
`2016.profit` = c(1300,4300,3450,5100,6850)
`2017.profit` = c(1300,4400,6450,5001,6500)
df <- data.frame(value,name,`2015.cost`,`2016.cost`,`2017.cost`,`2015.profit`,`2016.profit`,`2017.profit`)

我创建了以下数据框:

structure(list(value = 1:5, name = structure(1:5, .Label = c("A", 
"B", "C", "D", "F"), class = "factor"), X2015.cost = c(100, 200, 
250, 300, 450), X2016.cost = c(200, 300, 350, 400, 550), X2017.cost = c(300, 
400, 450, 500, 650), X2015.profit = c(1000, 4200, 2450, 1500, 
7650), X2016.profit = c(1300, 4300, 3450, 5100, 6850), X2017.profit = c(1300, 
4400, 6450, 5001, 6500)), .Names = c("value", "name", "X2015.cost", 
"X2016.cost", "X2017.cost", "X2015.profit", "X2016.profit", "X2017.profit"
), row.names = c(NA, -5L), class = "data.frame")

我本质上希望能够解析我的成本(在'。'这样的分隔符)和利润列名称的年份,然后将数据框从这种宽格式转移到长,这样我就有多年了所有在列和成本,利润值作为另一列。我的目标是建立一个像这样的数据框:

value name  year    cost    profit
1    A   2015    100    1000
2    B   2015    200    4200
3    C   2015    250    2450
4    D   2015    300    1500
5    E   2015    450    7650
1    A   2016    200    1300
2    B   2016    300    4300
3    C   2016    350    3450
4    D   2016    400    5100
5    E   2016    550    6850
1    A   2017    300    1300
2    B   2017    400    4400
3    C   2017    450    6450
4    D   2017    500    5001
5    E   2017    650    6500

任何帮助将不胜感激

2 个答案:

答案 0 :(得分:4)

这是 dplyr tidyr

的一种方式
library(dplyr)
library(tidyr)
df %>%
    gather('col', 'val', X2015.cost:X2017.profit) %>%
    separate(col, c('year', 'col_type'), sep='\\.') %>%
    mutate(year=extract_numeric(year)) %>%
    spread(col_type, val)

答案 1 :(得分:3)

我们可以使用melt中的data.tablemeasure可以使用多个library(data.table) dM <- melt(setDT(df1), measure=patterns('cost$', 'profit$'), variable.name='year', value.name=c('cost', 'profit')) dM[ ,year :=as.numeric(unique(gsub('\\D+', '', grep('^X\\d+', names(df1), value=TRUE))))[year]] dM # value name year cost profit # 1: 1 A 2015 100 1000 # 2: 2 B 2015 200 4200 # 3: 3 C 2015 250 2450 # 4: 4 D 2015 300 1500 # 5: 5 F 2015 450 7650 # 6: 1 A 2016 200 1300 # 7: 2 B 2016 300 4300 # 8: 3 C 2016 350 3450 # 9: 4 D 2016 400 5100 #10: 5 F 2016 550 6850 #11: 1 A 2017 300 1300 #12: 2 B 2017 400 4400 #13: 3 C 2017 450 6450 #14: 4 D 2017 500 5001 #15: 5 F 2017 650 6500 列。

{{1}}