我正在分析R中Ecdat软件包中的Males数据集。
我想计算与工会有关联的每一类人(黑人,Hips和其他人)的百分比。
数据的结构为:
$str(Males)
'data.frame': 4360 obs. of 12 variables:
$ nr : int 13 13 13 13 13 13 13 13 17 17 ...
$ year : int 1980 1981 1982 1983 1984 1985 1986 1987 1980 1981 ...
$ school : int 14 14 14 14 14 14 14 14 13 13 ...
$ exper : int 1 2 3 4 5 6 7 8 4 5 ...
$ union : Factor w/ 2 levels "no","yes": 1 2 1 1 1 1 1 1 1 1 ...
$ ethn : Factor w/ 3 levels "other","black",..: 1 1 1 1 1 1 1 1 1 1 ...
$ maried : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
$ health : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
$ wage : num 1.2 1.85 1.34 1.43 1.57 ...
$ industry : Factor w/ 12 levels "Agricultural",..: 7 8 7 7 8 7 7 7 4 4 ...
$ occupation: Factor w/ 9 levels "Professional, Technical_and_kindred",..: 9 9 9 9 5 2 2 2 2 2 ...
$ residence : Factor w/ 4 levels "rural_area","north_east",..: 2 2 2 2 2 2 2 2 2 2 ...
以下代码可以选择1980年:
Males %>%
filter(year == '1980') %>%
select(union, ethn)
union ethn
1 no other
9 no other
17 no other
25 yes other
33 yes hisp
41 no hisp
49 no other
57 no other
65 yes black
... ... ...
最终结果应该是这样的:
Year: 1980:
union ethn pct
no other 0.25
no black 0.25
no hisp ...
yes other ...
yes black ...
yes hisp ...
Year: 1981:
union ethn pct
no other 0.25
no black 0.25
no hisp ...
yes other ...
yes black ...
yes hisp ...
....
答案 0 :(得分:1)
您可以使用group_by()
和summarize()
来解决它,如下所示:
df %>%
Males %>%
filter(year == '1980') %>%
select(union, ethn) %>%
group_by(ethn) %>%
summarize(yes = sum(union == 'yes')*100/n(),
no = sum(union == 'no')*100/n())
以下是输出:
# A tibble: 3 x 3
ethn yes no
<fct> <dbl> <dbl>
1 other 22.2 77.8
2 black 36.5 63.5
3 hisp 30.6 69.4
答案 1 :(得分:1)
与此同时,我使用函数 pct_routine 获得了另一种回答这个问题的方法。
df1980 <- Males %>%
filter(year == '1980') %>%
select(union, ethn)
pct.1980 <- pct_routine(df1980, ethn,union)
pct.1980
结果与rodolfosveiga建议的相同:
# A tibble: 6 x 3
# Groups: ethn [3]
ethn union pct
<fct> <fct> <dbl>
1 other no 0.778
2 other yes 0.222
3 black no 0.635
4 black yes 0.365
5 hisp no 0.694
6 hisp yes 0.306