我对弄清失业者的频率很感兴趣,他们在我的数据集中也是非裔美国人/黑人。我有一个很大的数据集,其中包括变量OCC(失业人员被编码为0)和种族(AA / Black被编码为2)。
我试图通过tidyverse使用group(by)函数,但是我认为我可能做错了,因为我收到以下错误消息。
这是代码:
RACE <- group_by(cps_data, OCC, RACE)
occupation <- summarise(RACE,
count = n(),
OCC = mean(OCC, na.rm = TRUE)
)
summarise(RACE, occupation = mean(OCC, na.rm = TRUE))
我创建的职业对象给我错误消息:
Error in summarise_impl(.data, dots) :
Column `OCC` can't be modified because it's a grouping variable
summary函数给了我一点点微妙的提示:
# A tibble: 1,374 x 3
# Groups: OCC [?]
OCC RACE occupation
<int> <int> <dbl>
1 0 1 0
2 0 2 0
3 0 3 0
4 0 4 0
5 0 5 0
6 0 6 0
7 0 7 0
8 0 8 0
9 0 9 0
10 10 1 10
以下是我的一些数据-我试图为你们复制以帮助您。您将看到上面我制作了另一个数据框,仅包含OCC和RACE,因为这是目前唯一相关的因素。
dput(head(cps_data,4))
structure(list(YEAR = c(2015L, 2015L, 2015L, 2015L), DATANUM = c(1L,
1L, 1L, 1L), SERIAL = c(1029644L, 1029644L, 1029705L, 1029708L
), CBSERIAL = c(403, 403, 1944, 1964), HHWT = c(194L, 194L, 142L,
77L), STATEICP = c(14L, 14L, 14L, 14L), STATEFIP = c(42L, 42L,
42L, 42L), CITY = c(5330L, 5330L, 5330L, 5330L), GQ = c(1L, 1L,
1L, 1L), PERNUM = c(1L, 3L, 1L, 1L), PERWT = c(194L, 140L, 142L,
78L), SEX = c(2L, 1L, 2L, 1L), AGE = c(37L, 35L, 60L, 41L), RACE = c(1L,
1L, 2L, 2L), RACED = c(100L, 100L, 200L, 200L), OCC = c(800L,
6260L, 0L, 350L), IND = c(7270L, 770L, 0L, 8190L), INCWAGE = c(75000L,
25000L, 0L, 83000L)), row.names = c(NA, 4L), class = "data.frame")
我希望获得一个输出,以显示我失业的人数,这些人也可以识别为非裔美国人/黑人,因此我可以比较我的数据集。
答案 0 :(得分:0)
如果我对你的理解正确,那你就快到了。
df %>%
group_by(OCC, RACE) %>%
summarize(count = n())
# A tibble: 4 x 3
# Groups: OCC [4]
OCC RACE count
<int> <int> <int>
1 0 2 1
2 350 2 1
3 800 1 1
4 6260 1 1
library(tidyverse)
df <- structure(list(YEAR = c(2015L, 2015L, 2015L, 2015L), DATANUM = c(1L,
1L, 1L, 1L), SERIAL = c(1029644L, 1029644L, 1029705L, 1029708L
), CBSERIAL = c(403, 403, 1944, 1964), HHWT = c(194L, 194L, 142L,
77L), STATEICP = c(14L, 14L, 14L, 14L), STATEFIP = c(42L, 42L,
42L, 42L), CITY = c(5330L, 5330L, 5330L, 5330L), GQ = c(1L, 1L,
1L, 1L), PERNUM = c(1L, 3L, 1L, 1L), PERWT = c(194L, 140L, 142L,
78L), SEX = c(2L, 1L, 2L, 1L), AGE = c(37L, 35L, 60L, 41L), RACE = c(1L,
1L, 2L, 2L), RACED = c(100L, 100L, 200L, 200L), OCC = c(800L,
6260L, 0L, 350L), IND = c(7270L, 770L, 0L, 8190L), INCWAGE = c(75000L,
25000L, 0L, 83000L)), row.names = c(NA, 4L), class = "data.frame")