如何在R中重组数据框(行至列)

时间:2019-12-10 18:20:39

标签: r dataframe dplyr

我有一个由3列组成的数据框(类别名称,月份和已售单元的总和)。

我想重新格式化我的数据框,其中类别名称和单位总和是我的行,每一列代表我指定的顺序的12个月(从Oct开始,以Sep结束)。

我该怎么做?我当前的df的结构如下:

`Category Name` Month   sum
   <fct>           <fct> <dbl>
 1 Diet Soda       Oct   34680
 2 Diet Soda       Nov   41589
 3 Diet Soda       Dec   31564
 4 Diet Soda       Jan   22635
 5 Diet Soda       Feb   34853
 6 Diet Soda       Mar   48583
 7 Diet Soda       Apr   33550
 8 Diet Soda       May   44991
 9 Diet Soda       Jun   34995
10 Diet Soda       Jul   33260
11 Diet Soda       Aug   46027
12 Diet Soda       Sep   33924
13 Diet Soda Can   Oct       0
14 Diet Soda Can   Nov       1
15 Diet Soda Can   Dec       0
16 Diet Soda Can   Jan       0
17 Diet Soda Can   Feb       0
18 Diet Soda Can   Mar       0
19 Diet Soda Can   Apr       0
20 Diet Soda Can   May       0

1 个答案:

答案 0 :(得分:1)

按组创建序列列后,一个选项为pivot_wider

library(dplyr)
library(tidyr)
df1 %>%
   group_by(Month, `Category Name`) %>%
   mutate(rn = row_number()) %>%
   pivot_wider(names_from = Month, values_from = sum)

注意:group_by/mutate并不是此数据中真正需要的,但在一般情况下

pivot_wider将数据从“长”格式重塑为“宽”格式

df1 %>% 
    pivot_wider(names_from = Month, values_from = sum)
# A tibble: 2 x 13
#  `Category Name`   Oct   Nov   Dec   Jan   Feb   Mar   Apr   May   Jun   Jul   Aug   Sep
#  <chr>           <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#1 Diet Soda       34680 41589 31564 22635 34853 48583 33550 44991 34995 33260 46027 33924
#2 Diet Soda Can       0     1     0     0     0     0     0     0    NA    NA    NA    NA

数据

df1 <- structure(list(`Category Name` = c("Diet Soda", "Diet Soda", 
"Diet Soda", "Diet Soda", "Diet Soda", "Diet Soda", "Diet Soda", 
"Diet Soda", "Diet Soda", "Diet Soda", "Diet Soda", "Diet Soda", 
"Diet Soda Can", "Diet Soda Can", "Diet Soda Can", "Diet Soda Can", 
"Diet Soda Can", "Diet Soda Can", "Diet Soda Can", "Diet Soda Can"
), Month = c("Oct", "Nov", "Dec", "Jan", "Feb", "Mar", "Apr", 
"May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec", "Jan", 
"Feb", "Mar", "Apr", "May"), sum = c(34680L, 41589L, 31564L, 
22635L, 34853L, 48583L, 33550L, 44991L, 34995L, 33260L, 46027L, 
33924L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L)), class = "data.frame",
row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17", "18", "19", "20"))