计算组中在另一列中具有特定值的行的百分比

时间:2019-07-28 02:23:42

标签: r dplyr

我正在使用数据集birthwt

对于每个年龄段,我想找到白人母亲的百分比。我的最终目标是按年龄显示该百分比。我怎样才能做到这一点?我正在学习如何使用tidyverse函数,所以如果可能的话,我宁愿那样做。到目前为止,这是我的工作:

library(tidyverse)
library(tidyselect)
library("MASS")

grouped <- birthwt %>%
  count(race, age)  %>%
  spread(key = race, value = n, fill = 0)

grouped

这将获得一个表格,其中每一行代表一个年龄,每个种族都有一列代表该年龄的母亲人数。这种方法可能正确,也可能不正确。

2 个答案:

答案 0 :(得分:2)

我们可以计算replace_val = 1 for i in range(B.shape[0]): for j in range(B.shape[1]): if B[i,j] == replace_val: C[i,j] = A[i,0] 的白色vals_to_change = np.where(B==1) C[vals_to_change] = A[vals_to_change[0],0]*B[vals_to_change] 的数量,然后将其除以每个年龄段的总行数以获得比率。

race

在基数R中,我们可以按照相同的逻辑使用age

library(dplyr)
birthwt %>%
  group_by(age) %>%
  summarise(perc = sum(race == 1)/n())

# A tibble: 24 x 2
#     age  perc
#   <int> <dbl>
# 1    14 0.333
# 2    15 0.333
# 3    16 0.286
# 4    17 0.25 
# 5    18 0.6  
# 6    19 0.625
# 7    20 0.333
# 8    21 0.417
# 9    22 0.769
#10    23 0.308
# … with 14 more rows

或者与您使用aggregate的方法类似,我们可以做到

aggregate(race~age, birthwt,function(x) sum(x == 1)/length(x))

答案 1 :(得分:2)

我们可以按“年龄”分组并获得逻辑mean的{​​{1}}

vector

或带有library(dplyr) birthwt %>% group_by(age) %>% summarise(perc = mean(race == 1)) # A tibble: 24 x 2 # age perc # <int> <dbl> # 1 14 0.333 # 2 15 0.333 # 3 16 0.286 # 4 17 0.25 # 5 18 0.6 # 6 19 0.625 # 7 20 0.333 # 8 21 0.417 # 9 22 0.769 #10 23 0.308 # … with 14 more rows 的选项

data.table

或使用library(data.table) setDT(birthwt)[, .(perc = mean(race == 1)), age]

base R

或者另一个基本的R选项是

birthwt$perc <- with(birthwt, ave(race == 1, age))

或与with(birthwt, tapply(race == 1, age, FUN = mean))

aggregate

或与aggregate(cbind(perc = race == 1) ~ age, birthwt, FUN = mean)

by