根据匹配的ID和定义的条件创建新变量

时间:2017-03-06 20:56:30

标签: r

如何创建一个新的指标变量,我们称之为WorkingFamily,如果家庭中的任何个人被雇用,则等于1,否则为0?

具有相同SERIAL的个人属于同一家庭。

使用EMPSTAT == 10或EMPSTAT == 12的个人。

> dput(head(IPUMS.SDdata.MC))
structure(list(YEAR = c(2016L, 2016L, 2016L, 2016L, 2016L, 2016L
), SERIAL = c(89076L, 89077L, 89078L, 89079L, 89080L, 89104L), 
HWTSUPP = c(30187500L, 30183100L, 28600900L, 21051300L, 31378100L, 
17928900L), ASECFLAG = c(1L, 1L, 1L, 1L, 1L, 1L), COUNTY = c(6073L, 
6073L, 6073L, 6073L, 6073L, 6073L), MONTH = c(3L, 3L, 3L, 
3L, 3L, 3L), PERNUM = c(1L, 1L, 1L, 1L, 1L, 3L), WTSUPP = c(30187500L, 
30183100L, 28600900L, 21051300L, 31378100L, 17497400L), FAMSIZE = c(1L, 
1L, 1L, 1L, 1L, 4L), EMPSTAT = c(32L, 32L, 32L, 32L, 32L, 
0L), HIMCAID = c(2L, 2L, 2L, 2L, 2L, 2L), PID = c("2016 3 89076 1", 
"2016 3 89077 1", "2016 3 89078 1", "2016 3 89079 1", "2016 3 89080 1", 
"2016 3 89104 3"), WTSUPP2 = c(3018.75, 3018.31, 2860.09, 
2105.13, 3137.81, 1749.74)), .Names = c("YEAR", "SERIAL", 
"HWTSUPP", "ASECFLAG", "COUNTY", "MONTH", "PERNUM", "WTSUPP", 
"FAMSIZE", "EMPSTAT", "HIMCAID", "PID", "WTSUPP2"), row.names = c(174187L, 
174188L, 174189L, 174190L, 174191L, 174248L), class = "data.frame")

enter image description here

1 个答案:

答案 0 :(得分:1)

我假设您遇到的困难是按顺序进行分组。您可以使用data.table使这很容易。你应该真的包括你曾尝试过的东西以及你所坚持的东西。

library(data.table)
dt = data.table(serial = c(rep(1,4), rep(2,4)), empstat = c(32,rep(0,7)))

在添加Employed

之前,data.table是什么样的
   serial empstat
1:      1      32
2:      1       0
3:      1       0
4:      1       0
5:      2       0
6:      2       0
7:      2       0
8:      2       0

然后,你可以运行它来查看是否有人的EMPSTAT为12或32。

dt[ , "Employed" := ifelse(any(empstat %in% c(12,32)),1,0), by = .(serial)]
   serial empstat Employed
1:      1      32        1
2:      1       0        1
3:      1       0        1
4:      1       0        1
5:      2       0        0
6:      2       0        0
7:      2       0        0
8:      2       0        0