如何将是/否行与dplyr(最好)成比例?

时间:2016-08-22 00:45:14

标签: r dplyr

数据来自Gistlyn

这是脚本:

library(dplyr)
library(ggplot2)
load("brfss2013.RData")

test <- brfss2013 %>%
  select(chcscncr,exract11) %>% 
  filter(chcscncr != "NA" , exract11 != "NA") %>% 
  group_by(exract11,chcscncr) %>% 
  summarise(count = n())

此表中的结果如下:

> head(test)
Source: local data frame [6 x 3]
Groups: exract11 [3]

                                                  exract11 chcscncr count
                                                    <fctr>   <fctr> <int>
1 Active Gaming Devices (Wii Fit, Dance, Dance revolution)      Yes    19
2 Active Gaming Devices (Wii Fit, Dance, Dance revolution)       No   287
3                                  Aerobics video or class      Yes   800
4                                  Aerobics video or class       No  7340
5                                              Backpacking      Yes     4
6                                              Backpacking       No    38

我想建立一个表格,给出每种运动类型的“是”比例,如:

Type     Ans Count
Sport A  yes 45
Sport A  no  55
Sport B  yes 34
Sport B  no  66

为:

Type      p(yes)
Sport A   0.45
Sport B   0.34

1 个答案:

答案 0 :(得分:4)

prop.table将总计转换为比例(在这种情况下,每个组的值只有x/sum(x)),因此对于您的&#34; From&#34;表:

brfss2013 %>%
    select(chcscncr,exract11) %>% 
    na.omit() %>%    # `==` doesn't work for NA
    count(exract11, chcscncr) %>%    # equivalent to `group_by(...) %>% summarise(n = n())`
    group_by(exract11) %>%
    mutate(pct = prop.table(n) * 100)    # `* 100` to convert to percent

## Source: local data frame [144 x 4]
## Groups: exract11 [75]
## 
##                                                    exract11 chcscncr     n      pct
##                                                      <fctr>   <fctr> <int>    <dbl>
## 1  Active Gaming Devices (Wii Fit, Dance, Dance revolution)      Yes    19  6.20915
## 2  Active Gaming Devices (Wii Fit, Dance, Dance revolution)       No   287 93.79085
## 3                                   Aerobics video or class      Yes   800  9.82801
## 4                                   Aerobics video or class       No  7340 90.17199
## 5                                               Backpacking      Yes     4  9.52381
## 6                                               Backpacking       No    38 90.47619
## 7                                                 Badminton      Yes     4 10.52632
## 8                                                 Badminton       No    34 89.47368
## 9                                                Basketball      Yes    37  1.64664
## 10                                               Basketball       No  2210 98.35336
## # ... with 134 more rows

为了你的&#34;到&#34;表格filter只显示"Yes"行:

brfss2013 %>%
    select(chcscncr,exract11) %>% 
    na.omit() %>% 
    count(exract11, chcscncr) %>%
    group_by(exract11) %>%
    mutate(p_yes = prop.table(n)) %>%
    filter(chcscncr == "Yes")

## Source: local data frame [69 x 4]
## Groups: exract11 [69]
## 
##                                                                 exract11 chcscncr     n      p_yes
##                                                                   <fctr>   <fctr> <int>      <dbl>
## 1               Active Gaming Devices (Wii Fit, Dance, Dance revolution)      Yes    19 0.06209150
## 2                                                Aerobics video or class      Yes   800 0.09828010
## 3                                                            Backpacking      Yes     4 0.09523810
## 4                                                              Badminton      Yes     4 0.10526316
## 5                                                             Basketball      Yes    37 0.01646640
## 6                                             Bicycling machine exercise      Yes   987 0.13708333
## 7                                                              Bicycling      Yes   728 0.08519602
## 8  Boating (Canoeing, rowing, kayaking, sailing for pleasure or camping)      Yes    22 0.11518325
## 9                                                                Bowling      Yes    68 0.09985316
## 10                                                               Boxing       Yes     5 0.01633987
## # ... with 59 more rows

&#34;是&#34;的比例从第一个表中可以看出,值非常小。