如何创建一个新的指标变量,我们称之为WorkingFamily,如果家庭中的任何个人被雇用,则等于1,否则为0?
具有相同SERIAL的个人属于同一家庭。
使用EMPSTAT == 10或EMPSTAT == 12的个人。
> dput(head(IPUMS.SDdata.MC))
structure(list(YEAR = c(2016L, 2016L, 2016L, 2016L, 2016L, 2016L
), SERIAL = c(89076L, 89077L, 89078L, 89079L, 89080L, 89104L),
HWTSUPP = c(30187500L, 30183100L, 28600900L, 21051300L, 31378100L,
17928900L), ASECFLAG = c(1L, 1L, 1L, 1L, 1L, 1L), COUNTY = c(6073L,
6073L, 6073L, 6073L, 6073L, 6073L), MONTH = c(3L, 3L, 3L,
3L, 3L, 3L), PERNUM = c(1L, 1L, 1L, 1L, 1L, 3L), WTSUPP = c(30187500L,
30183100L, 28600900L, 21051300L, 31378100L, 17497400L), FAMSIZE = c(1L,
1L, 1L, 1L, 1L, 4L), EMPSTAT = c(32L, 32L, 32L, 32L, 32L,
0L), HIMCAID = c(2L, 2L, 2L, 2L, 2L, 2L), PID = c("2016 3 89076 1",
"2016 3 89077 1", "2016 3 89078 1", "2016 3 89079 1", "2016 3 89080 1",
"2016 3 89104 3"), WTSUPP2 = c(3018.75, 3018.31, 2860.09,
2105.13, 3137.81, 1749.74)), .Names = c("YEAR", "SERIAL",
"HWTSUPP", "ASECFLAG", "COUNTY", "MONTH", "PERNUM", "WTSUPP",
"FAMSIZE", "EMPSTAT", "HIMCAID", "PID", "WTSUPP2"), row.names = c(174187L,
174188L, 174189L, 174190L, 174191L, 174248L), class = "data.frame")
答案 0 :(得分:1)
我假设您遇到的困难是按顺序进行分组。您可以使用data.table使这很容易。你应该真的包括你曾尝试过的东西以及你所坚持的东西。
library(data.table)
dt = data.table(serial = c(rep(1,4), rep(2,4)), empstat = c(32,rep(0,7)))
在添加Employed
之前,data.table是什么样的 serial empstat
1: 1 32
2: 1 0
3: 1 0
4: 1 0
5: 2 0
6: 2 0
7: 2 0
8: 2 0
然后,你可以运行它来查看是否有人的EMPSTAT为12或32。
dt[ , "Employed" := ifelse(any(empstat %in% c(12,32)),1,0), by = .(serial)]
serial empstat Employed
1: 1 32 1
2: 1 0 1
3: 1 0 1
4: 1 0 1
5: 2 0 0
6: 2 0 0
7: 2 0 0
8: 2 0 0