我想将数字变量家庭收入分解为3个不同的类别:低,中,高。
所有3个收入群体均由Single houshold vs. Non Single houshold确定:
low middle high
1. Single houshold 860 861 – 1844 >1845
2. Non Single houshold 1900 979 – 4242 >4242
感兴趣的变量是个人ID(pid),家庭ID(隐藏)。例如
year pid hid household income
1990 201 1 1000
1991 201 1 1000
1992 201 1 2000
1990 202 1 2000
1991 202 1 3000
1992 202 1 4000
1990 3000 2 5000
1991 3000 2 ..
1992 3000 2
1990 1000 3
1991 1000 3
1992 1000 3
我想确定它是否是一个家庭,并添加相应的收入组。我想创建一个空的向量"收入":
data_s1<- within(data,{
Income<-NA
Income[income <900 & single household ]<-low
Income[income<1900 & nonsingle household]<-low
Income[income %in% 861:1844 & single household]<-middle
Income[income %in% 979:4242 & nonsingle household ]<-middle
Income[income >1845 & single household ]<-high
Income[income >4242 & nonsingle household ]<-high
})
所以我在实现这个逻辑结构时遇到了一些问题。
答案 0 :(得分:0)
您可以尝试以下方法:
# define the cutoffs per group
single <- c(0, 860, 1844, Inf)
nonsingle <- c(0, 1900, 4242, Inf)
# define the group labels
l <- c("low", "middle", "high")
# check if household has exactly 1 pid (==singlehousehold)
df$singlehousehold <- with(df, ave(pid, hid, FUN = function(x) length(unique(x)) == 1L))
# split the data according to singlehousehold and cut the income into groups. Then rbind back together
df <- do.call(rbind, lapply(split(df, df$singlehousehold), function(x) {
if (x$singlehousehold[1]) {
x$incomeclass <- cut(x[, "household income"], single, labels = l)
x
} else {
x$incomeclass <- cut(x[, "household income"], nonsingle, labels = l)
x
}
}
))
rownames(df) <- NULL # to reset the row names