如何通过将某些行名转换为列名并计算出现次数来汇总R中的表?

时间:2019-07-09 09:57:32

标签: r

我有一些看起来像这样的数据:

 `Category` `Count`
   <chr>        <chr>   
 1 X0101       <NA>    
 2 17           1       
 3 22           1       
 4 23           1       
 5 27           1       
 6 34           1       
 7 35           2       
 8 40           1       
 9 51           1       
10 66           1       
11 X0102     <NA>    
12 51           1       
13 53           1       
14 59           1       
15 61           1       
16 X0103     <NA>    
17 10           1       
18 22           1       
19 17           1   

这是用于生成我的数据框的代码:

 structure(list(`Row Labels` = c("X0101", "17", "22", "23", 
    "27", "34", "35", "40", "51", "66", "X0102", "51", "53", 
    "59", "61", "X0103", "10", "22", "17"), `Count` = c(NA, 
    "1", "1", "1", "1", "1", "2", "1", "1", "1", NA, "1", "1", "1", 
    "1", NA, "1", "1", "1")), .Names = c("Category", "Count"), row.names = c(NA, 
    -19L), class = c("tbl_df", "tbl", "data.frame"))

我想更改我的表,以便只包含名为“ X0101”,“ X0102”,“ X0103”的行以及包含每个子类别计数的列。我是R语言的新手,不确定什么代码可以实现这一目标。

这是我想要的输出:

Category   10  17  22  23  27  34  35  40  51  53  59  61  66  
X0101          1   1   1   1   1   2   1   1                1
X0102                                     1        1   1
X0103     1  1   1

2 个答案:

答案 0 :(得分:0)

一种dplyrtidyr的可能性是:

df %>%
 group_by(grp = cumsum(is.na(Count))) %>%
 mutate(Category2 = first(Category)) %>%
 ungroup() %>%
 na.omit() %>%
 select(-grp) %>%
 spread(Category, Count)

  Category2 `10`  `17`  `22`  `23`  `27`  `34`  `35`  `40`  `51`  `53`  `59`  `61` 
  <chr>     <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Higher    <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  1     1     1     1    
2 Lower     <NA>  1     1     1     1     1     2     1     1     <NA>  <NA>  <NA> 
3 Medium    1     1     1     <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 

答案 1 :(得分:0)

这是使用dplyrtidyr的一种方法。获取Category列(vals)中的所有非数值,使用factorcumsum作为labels创建一个vals变量,删除NA行并将spread更改为宽格式。

library(dplyr)
library(tidyr)

vals <- grep("^\\d+$", df$Category, invert = TRUE, value = TRUE)

df %>%
  mutate(temp = factor(cumsum(Category %in% vals), labels = vals)) %>%
  na.omit %>%
  spread(Category, Count)

# A tibble: 3 x 14
#  temp   `10`  `17`  `22`  `23`  `27`  `34`  `35`  `40`  `51`  `53`  `59`  `61`  `66` 
#  <fct>  <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#1 Lower  NA    1     1     1     1     1     2     1     1     NA    NA    NA    1    
#2 Higher NA    NA    NA    NA    NA    NA    NA    NA    1     1     1     1     NA   
#3 Medium 1     1     1     NA    NA    NA    NA    NA    NA    NA    NA    NA    NA