数据来自Gistlyn
这是脚本:
library(dplyr)
library(ggplot2)
load("brfss2013.RData")
test <- brfss2013 %>%
select(chcscncr,exract11) %>%
filter(chcscncr != "NA" , exract11 != "NA") %>%
group_by(exract11,chcscncr) %>%
summarise(count = n())
此表中的结果如下:
> head(test)
Source: local data frame [6 x 3]
Groups: exract11 [3]
exract11 chcscncr count
<fctr> <fctr> <int>
1 Active Gaming Devices (Wii Fit, Dance, Dance revolution) Yes 19
2 Active Gaming Devices (Wii Fit, Dance, Dance revolution) No 287
3 Aerobics video or class Yes 800
4 Aerobics video or class No 7340
5 Backpacking Yes 4
6 Backpacking No 38
我想建立一个表格,给出每种运动类型的“是”比例,如:
这
Type Ans Count
Sport A yes 45
Sport A no 55
Sport B yes 34
Sport B no 66
为:
Type p(yes)
Sport A 0.45
Sport B 0.34
答案 0 :(得分:4)
prop.table
将总计转换为比例(在这种情况下,每个组的值只有x/sum(x)
),因此对于您的&#34; From&#34;表:
brfss2013 %>%
select(chcscncr,exract11) %>%
na.omit() %>% # `==` doesn't work for NA
count(exract11, chcscncr) %>% # equivalent to `group_by(...) %>% summarise(n = n())`
group_by(exract11) %>%
mutate(pct = prop.table(n) * 100) # `* 100` to convert to percent
## Source: local data frame [144 x 4]
## Groups: exract11 [75]
##
## exract11 chcscncr n pct
## <fctr> <fctr> <int> <dbl>
## 1 Active Gaming Devices (Wii Fit, Dance, Dance revolution) Yes 19 6.20915
## 2 Active Gaming Devices (Wii Fit, Dance, Dance revolution) No 287 93.79085
## 3 Aerobics video or class Yes 800 9.82801
## 4 Aerobics video or class No 7340 90.17199
## 5 Backpacking Yes 4 9.52381
## 6 Backpacking No 38 90.47619
## 7 Badminton Yes 4 10.52632
## 8 Badminton No 34 89.47368
## 9 Basketball Yes 37 1.64664
## 10 Basketball No 2210 98.35336
## # ... with 134 more rows
为了你的&#34;到&#34;表格filter
只显示"Yes"
行:
brfss2013 %>%
select(chcscncr,exract11) %>%
na.omit() %>%
count(exract11, chcscncr) %>%
group_by(exract11) %>%
mutate(p_yes = prop.table(n)) %>%
filter(chcscncr == "Yes")
## Source: local data frame [69 x 4]
## Groups: exract11 [69]
##
## exract11 chcscncr n p_yes
## <fctr> <fctr> <int> <dbl>
## 1 Active Gaming Devices (Wii Fit, Dance, Dance revolution) Yes 19 0.06209150
## 2 Aerobics video or class Yes 800 0.09828010
## 3 Backpacking Yes 4 0.09523810
## 4 Badminton Yes 4 0.10526316
## 5 Basketball Yes 37 0.01646640
## 6 Bicycling machine exercise Yes 987 0.13708333
## 7 Bicycling Yes 728 0.08519602
## 8 Boating (Canoeing, rowing, kayaking, sailing for pleasure or camping) Yes 22 0.11518325
## 9 Bowling Yes 68 0.09985316
## 10 Boxing Yes 5 0.01633987
## # ... with 59 more rows
&#34;是&#34;的比例从第一个表中可以看出,值非常小。