我正在使用数据集birthwt。
对于每个年龄段,我想找到白人母亲的百分比。我的最终目标是按年龄显示该百分比。我怎样才能做到这一点?我正在学习如何使用tidyverse函数,所以如果可能的话,我宁愿那样做。到目前为止,这是我的工作:
library(tidyverse)
library(tidyselect)
library("MASS")
grouped <- birthwt %>%
count(race, age) %>%
spread(key = race, value = n, fill = 0)
grouped
这将获得一个表格,其中每一行代表一个年龄,每个种族都有一列代表该年龄的母亲人数。这种方法可能正确,也可能不正确。
答案 0 :(得分:2)
我们可以计算replace_val = 1
for i in range(B.shape[0]):
for j in range(B.shape[1]):
if B[i,j] == replace_val:
C[i,j] = A[i,0]
的白色vals_to_change = np.where(B==1)
C[vals_to_change] = A[vals_to_change[0],0]*B[vals_to_change]
的数量,然后将其除以每个年龄段的总行数以获得比率。
race
在基数R中,我们可以按照相同的逻辑使用age
library(dplyr)
birthwt %>%
group_by(age) %>%
summarise(perc = sum(race == 1)/n())
# A tibble: 24 x 2
# age perc
# <int> <dbl>
# 1 14 0.333
# 2 15 0.333
# 3 16 0.286
# 4 17 0.25
# 5 18 0.6
# 6 19 0.625
# 7 20 0.333
# 8 21 0.417
# 9 22 0.769
#10 23 0.308
# … with 14 more rows
或者与您使用aggregate
的方法类似,我们可以做到
aggregate(race~age, birthwt,function(x) sum(x == 1)/length(x))
答案 1 :(得分:2)
我们可以按“年龄”分组并获得逻辑mean
的{{1}}
vector
或带有library(dplyr)
birthwt %>%
group_by(age) %>%
summarise(perc = mean(race == 1))
# A tibble: 24 x 2
# age perc
# <int> <dbl>
# 1 14 0.333
# 2 15 0.333
# 3 16 0.286
# 4 17 0.25
# 5 18 0.6
# 6 19 0.625
# 7 20 0.333
# 8 21 0.417
# 9 22 0.769
#10 23 0.308
# … with 14 more rows
的选项
data.table
或使用library(data.table)
setDT(birthwt)[, .(perc = mean(race == 1)), age]
base R
或者另一个基本的R选项是
birthwt$perc <- with(birthwt, ave(race == 1, age))
或与with(birthwt, tapply(race == 1, age, FUN = mean))
aggregate
或与aggregate(cbind(perc = race == 1) ~ age, birthwt, FUN = mean)
by