计算R中的种族百分比

时间:2020-09-07 03:29:25

标签: r dplyr

我正在分析R中Ecdat软件包中的Males数据集。

我想计算与工会有关联的每一类人(黑人,Hips和其他人)的百分比。

数据的结构为:


 $str(Males)
 
 'data.frame':  4360 obs. of  12 variables:

 $ nr        : int  13 13 13 13 13 13 13 13 17 17 ...
 $ year      : int  1980 1981 1982 1983 1984 1985 1986 1987 1980 1981 ...
 $ school    : int  14 14 14 14 14 14 14 14 13 13 ...
 $ exper     : int  1 2 3 4 5 6 7 8 4 5 ...
 $ union     : Factor w/ 2 levels "no","yes": 1 2 1 1 1 1 1 1 1 1 ...
 $ ethn      : Factor w/ 3 levels "other","black",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ maried    : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
 $ health    : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
 $ wage      : num  1.2 1.85 1.34 1.43 1.57 ...
 $ industry  : Factor w/ 12 levels "Agricultural",..: 7 8 7 7 8 7 7 7 4 4 ...
 $ occupation: Factor w/ 9 levels "Professional, Technical_and_kindred",..: 9 9 9 9 5 2 2 2 2 2 ...
 $ residence : Factor w/ 4 levels "rural_area","north_east",..: 2 2 2 2 2 2 2 2 2 2 ...

以下代码可以选择1980年:

Males %>% 
  filter(year == '1980') %>%
  select(union, ethn)
        union  ethn
1       no    other
9       no    other
17      no    other
25     yes    other
33     yes    hisp
41      no    hisp
49      no    other
57      no    other
65     yes    black
...    ...    ...

最终结果应该是这样的:


Year: 1980:

union ethn    pct
no    other   0.25
no    black   0.25
no    hisp    ...
yes   other   ...
yes   black   ...
yes   hisp    ...

Year: 1981:

union ethn    pct
no    other   0.25
no    black   0.25
no    hisp    ...
yes   other   ...
yes   black   ...
yes   hisp    ...


....

2 个答案:

答案 0 :(得分:1)

您可以使用group_by()summarize()来解决它,如下所示:

df %>%
  Males %>%
  filter(year == '1980') %>%
  select(union, ethn) %>%
  group_by(ethn) %>%
  summarize(yes = sum(union == 'yes')*100/n(),
            no = sum(union == 'no')*100/n())

以下是输出:

  # A tibble: 3 x 3
    ethn    yes    no
    <fct> <dbl> <dbl>
  1 other  22.2  77.8
  2 black  36.5  63.5
  3 hisp   30.6  69.4

答案 1 :(得分:1)

与此同时,我使用函数 pct_routine 获得了另一种回答这个问题的方法。

  df1980 <- Males %>% 
    filter(year == '1980') %>%
    select(union, ethn) 

   pct.1980 <- pct_routine(df1980, ethn,union)
   pct.1980

结果与rodolfosveiga建议的相同:

  # A tibble: 6 x 3
  # Groups:   ethn [3]
    ethn  union   pct
    <fct> <fct> <dbl>
  1 other no    0.778
  2 other yes   0.222
  3 black no    0.635
  4 black yes   0.365
  5 hisp  no    0.694
  6 hisp  yes   0.306