用R中的dplyr重塑表格

时间:2017-11-23 19:21:34

标签: r dplyr transform reshape

关于在{R}中正确应用dplyr的一点建议很受欢迎。 我们有以下数据:

   City            Amount    Category
1  Los Angeles     100       Film
2  Los Angeles     200       Film
3  Los Angeles     400       Music 
4  Seattle         300       Coffee
5  Boston          600       Books
...

最终结果应如下所示:

                        Film   Coffee   Books   ...
City  
Los Angeles, CA         Sum    Sum      Sum     Sum 
Seattle, WA             Sum    Sum      Sum     Sum 
Boston, MA              Sum    Sum      Sum     Sum  

我希望数据透视表汇总每个城市中每个类别的“金额”的总值,以便城市位于列的左侧,所有类别位于顶部的行中。

尝试:

data %>%                                            
  group_by(Location, Category) %>%
  summarise(Amount = sum(Amount))

看起来更像

   City            Amount    Category
1  Los Angeles     300       Film
3  Los Angeles     400       Music 
4  Seattle         300       Coffee
5  Boston          600       Books

计算是正确的,但如上所述,我们需要城市和类别作为矩阵,其中每个金额的总和都在相应的单元格内。

感谢您的帮助!

1 个答案:

答案 0 :(得分:2)

您正在寻找的是tidyr::spread将您的data.frame从长格式重新整形为宽格式:

library(tidyverse)

# recreate the data
data <- tribble(
  ~City,             ~Amount,   ~Category,
  "Los Angeles",     100,       "Film",
  "Los Angeles",     200,       "Film",
  "Los Angeles",     400,       "Music", 
  "Seattle",         300,       "Coffee",
  "Boston",          600,       "Books"
)

# using your code to get the data in the long-format
data_long <- data %>% 
  group_by(City, Category) %>%
  summarise(Amount = sum(Amount))

data_long
#> # A tibble: 4 x 3
#> # Groups:   City [?]
#>          City Category Amount
#>         <chr>    <chr>  <dbl>
#> 1      Boston    Books    600
#> 2 Los Angeles     Film    300
#> 3 Los Angeles    Music    400
#> 4     Seattle   Coffee    300

# spread to wide using the tidyr-package (in tidyverse)
data_wide <- spread(data_long, key = "Category", value = "Amount", fill = 0)

data_wide
#> # A tibble: 3 x 5
#> # Groups:   City [3]
#>          City Books Coffee  Film Music
#> *       <chr> <dbl>  <dbl> <dbl> <dbl>
#> 1      Boston   600      0     0     0
#> 2 Los Angeles     0      0   300   400
#> 3     Seattle     0    300     0     0

走向矩阵

mat <- as.matrix(data_wide %>% ungroup %>% select(-City))
rownames(mat) <- data_wide$City

mat
#>             Books Coffee Film Music
#> Boston        600      0    0     0
#> Los Angeles     0      0  300   400
#> Seattle         0    300    0     0

str(mat)
#>  num [1:3, 1:4] 600 0 0 0 0 300 0 300 0 0 ...
#>  - attr(*, "dimnames")=List of 2
#>   ..$ : chr [1:3] "Boston" "Los Angeles" "Seattle"
#>   ..$ : chr [1:4] "Books" "Coffee" "Film" "Music"