从宽到长的数据转换多列

时间:2019-12-10 08:54:50

标签: r tidyr

我有df_wide数据框,用于以宽格式显示公司数据

    df_wide <- data.frame(Company=c('CompanyA','CompanyB', 'CompanyC'),
             Industry=c('Manufacturing', 'Telecom', 'Services'),
             Sales.2015=c('100', '500', '1000'), 
             Sales.2016=c('110', '550', '1100'), 
             Sales.2017=c('120', '600', '1200'),
             EBITDA.2015=c('10', '50', '100'), 
             EBITDA.2016=c('11', '55', '110'),
             EBITDA.2017=c('12', '60', '120'))

        Company      Industry Sales.2015 Sales.2016 Sales.2017 EBITDA.2015 EBITDA.2016 EBITDA.2017
    1 CompanyA Manufacturing        100        110        120          10          11          12
    2 CompanyB       Telecom        500        550        600          50          55          60
    3 CompanyC      Services       1000       1100       1200         100         110         120

我希望将数据转换为df_long之类的长格式

    df_long <- data.frame(Company=c('CompanyA', 'CompanyA', 'CompanyA', 'CompanyB', 'CompanyB','CompanyB','CompanyC','CompanyC', 'CompanyC'),
              Industry=c('Manufacturing','Manufacturing','Manufacturing','Telecom','Telecom','Telecom','Services','Services','Services'),
              Year=c('2015','2016','2017','2015','2016','2017','2015','2016','2017'),
              Sales=c('100','110','120','500', '550','600','1000','1100','1200'),
              EBITDA=c('10','11','12','50','55','60','100','110','120'))

       Company      Industry Year Sales EBITDA
    1 CompanyA Manufacturing 2015   100     10
    2 CompanyA Manufacturing 2016   110     11
    3 CompanyA Manufacturing 2017   120     12
    4 CompanyB       Telecom 2015   500     50
    5 CompanyB       Telecom 2016   550     55
    6 CompanyB       Telecom 2017   600     60
    7 CompanyC      Services 2015  1000    100
    8 CompanyC      Services 2016  1100    110
    9 CompanyC      Services 2017  1200    120

我尝试过使用pivot_longer,并且仅使用一个变量就可以正常工作,但是在尝试同时调整销售和EBITDA时却遇到了困难。

    df_long2 <- df_wide %>% pivot_longer(cols = starts_with("Sales"),
                                 names_to = "Year",
                                 values_to = "Sales")

4 个答案:

答案 0 :(得分:5)

使用pivot_longer

tidyr::pivot_longer(df_wide, 
                   cols = -c(Company, Industry), 
                   names_to = c(".value", "Year"),
                   names_sep = "\\.") %>% type.convert()

#  Company  Industry       Year Sales EBITDA
#  <fct>    <fct>         <int> <int>  <int>
#1 CompanyA Manufacturing  2015   100     10
#2 CompanyA Manufacturing  2016   110     11
#3 CompanyA Manufacturing  2017   120     12
#4 CompanyB Telecom        2015   500     50
#5 CompanyB Telecom        2016   550     55
#6 CompanyB Telecom        2017   600     60
#7 CompanyC Services       2015  1000    100
#8 CompanyC Services       2016  1100    110
#9 CompanyC Services       2017  1200    120

答案 1 :(得分:1)

Base R解决方案:

df_long <- 

  reshape(df_wide,

        direction = "long",

        varying = which(!names(df_wide) %in% c("Company", "Industry")),

        ids = NULL,

        new.row.names = 1:(length(which(!names(df_wide) %in% c("Company", "Industry"))) * nrow(df_wide))

        )

答案 2 :(得分:0)

我还不熟悉return,但这是一个pivot_longer()解决方案:

data.table

答案 3 :(得分:0)

这里是base R的解决方案(类似于@hello_friend的解决方案),其中reshape()用于使表从宽到长:

df_long <- reshape(df_wide,
        direction = "long",
        varying = seq(df_wide)[-(1:2)],
        ids = NULL,
        timevar = "Year",
        times = unique(gsub("\\w+\\.(.*)","\\1",names(df_wide[-(1:2)]))),
        new.row.names = seq(ncol(df_wide[-(1:2)])*nrow(df_wide))
        )

如此

> df_long
   Company      Industry Year Sales EBITDA
1 CompanyA Manufacturing 2015   100     10
2 CompanyB       Telecom 2015   500     50
3 CompanyC      Services 2015  1000    100
4 CompanyA Manufacturing 2016   110     11
5 CompanyB       Telecom 2016   550     55
6 CompanyC      Services 2016  1100    110
7 CompanyA Manufacturing 2017   120     12
8 CompanyB       Telecom 2017   600     60
9 CompanyC      Services 2017  1200    120