dplyr
的一点建议很受欢迎。
我们有以下数据:
City Amount Category
1 Los Angeles 100 Film
2 Los Angeles 200 Film
3 Los Angeles 400 Music
4 Seattle 300 Coffee
5 Boston 600 Books
...
最终结果应如下所示:
Film Coffee Books ...
City
Los Angeles, CA Sum Sum Sum Sum
Seattle, WA Sum Sum Sum Sum
Boston, MA Sum Sum Sum Sum
我希望数据透视表汇总每个城市中每个类别的“金额”的总值,以便城市位于列的左侧,所有类别位于顶部的行中。
尝试:
data %>%
group_by(Location, Category) %>%
summarise(Amount = sum(Amount))
看起来更像
City Amount Category
1 Los Angeles 300 Film
3 Los Angeles 400 Music
4 Seattle 300 Coffee
5 Boston 600 Books
计算是正确的,但如上所述,我们需要城市和类别作为矩阵,其中每个金额的总和都在相应的单元格内。
感谢您的帮助!
答案 0 :(得分:2)
您正在寻找的是tidyr::spread
将您的data.frame从长格式重新整形为宽格式:
library(tidyverse)
# recreate the data
data <- tribble(
~City, ~Amount, ~Category,
"Los Angeles", 100, "Film",
"Los Angeles", 200, "Film",
"Los Angeles", 400, "Music",
"Seattle", 300, "Coffee",
"Boston", 600, "Books"
)
# using your code to get the data in the long-format
data_long <- data %>%
group_by(City, Category) %>%
summarise(Amount = sum(Amount))
data_long
#> # A tibble: 4 x 3
#> # Groups: City [?]
#> City Category Amount
#> <chr> <chr> <dbl>
#> 1 Boston Books 600
#> 2 Los Angeles Film 300
#> 3 Los Angeles Music 400
#> 4 Seattle Coffee 300
# spread to wide using the tidyr-package (in tidyverse)
data_wide <- spread(data_long, key = "Category", value = "Amount", fill = 0)
data_wide
#> # A tibble: 3 x 5
#> # Groups: City [3]
#> City Books Coffee Film Music
#> * <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 Boston 600 0 0 0
#> 2 Los Angeles 0 0 300 400
#> 3 Seattle 0 300 0 0
mat <- as.matrix(data_wide %>% ungroup %>% select(-City))
rownames(mat) <- data_wide$City
mat
#> Books Coffee Film Music
#> Boston 600 0 0 0
#> Los Angeles 0 0 300 400
#> Seattle 0 300 0 0
str(mat)
#> num [1:3, 1:4] 600 0 0 0 0 300 0 300 0 0 ...
#> - attr(*, "dimnames")=List of 2
#> ..$ : chr [1:3] "Boston" "Los Angeles" "Seattle"
#> ..$ : chr [1:4] "Books" "Coffee" "Film" "Music"