如何分解数值变量?

时间:2015-01-03 12:54:36

标签: r

我想将数字变量家庭收入分解为3个不同的类别:低,中,高。

所有3个收入群体均由Single houshold vs. Non Single houshold确定:

                             low            middle             high
  1. Single houshold      860             861 – 1844           >1845 
  2. Non Single houshold  1900            979 – 4242           >4242

感兴趣的变量是个人ID(pid),家庭ID(隐藏)。例如

         year    pid                hid               household income
         1990     201                 1                 1000
         1991     201                 1                 1000
         1992     201                 1                 2000
         1990     202                 1                 2000
         1991     202                 1                 3000
         1992     202                 1                 4000  
         1990     3000                2                 5000
         1991     3000                2                  ..
         1992     3000                2
         1990     1000                3
         1991     1000                3
         1992     1000                3

我想确定它是否是一个家庭,并添加相应的收入组。我想创建一个空的向量"收入":

            data_s1<- within(data,{
                           Income<-NA
                             Income[income <900 & single household ]<-low
                             Income[income<1900 & nonsingle household]<-low
                             Income[income %in%  861:1844  & single household]<-middle
                             Income[income %in%  979:4242 & nonsingle household ]<-middle
                             Income[income >1845 & single household  ]<-high
                             Income[income >4242 & nonsingle household  ]<-high
})

所以我在实现这个逻辑结构时遇到了一些问题。

1 个答案:

答案 0 :(得分:0)

您可以尝试以下方法:

# define the cutoffs per group
single <- c(0, 860, 1844, Inf) 
nonsingle <- c(0, 1900, 4242, Inf)
# define the group labels 
l <- c("low", "middle", "high") 
# check if household has exactly 1 pid (==singlehousehold)
df$singlehousehold <- with(df, ave(pid, hid, FUN = function(x) length(unique(x)) == 1L))
# split the data according to singlehousehold and cut the income into groups. Then rbind back together
df <- do.call(rbind, lapply(split(df, df$singlehousehold), function(x) { 
  if (x$singlehousehold[1]) {
    x$incomeclass <- cut(x[, "household income"], single, labels = l)
    x 
  } else {
      x$incomeclass <- cut(x[, "household income"], nonsingle, labels = l)
      x
    }
  }
))
rownames(df) <- NULL   # to reset the row names