我有一个这样的数据框
year id employment
1998 1 0
2000 1 0
2002 1 0
2004 1 0
1998 2 0
2000 2 0
2002 2 1
2004 2 1
1998 3 0
2000 3 1
2002 3 1
2004 3 1
我想创建一个新的变量“ spell”,该变量指示每个人在什么时间从失业(employment = 0)转变为已雇用(employment = 1)状态。换句话说,我想要这种形式的东西
year id employment spell
1998 1 0 0
2000 1 0 0
2002 1 0 0
2004 1 0 0
1998 2 0 3
2000 2 0 3
2002 2 1 3
2004 2 1 3
1998 3 0 2
2000 3 1 2
2002 3 1 2
2004 3 1 2
如您所见,个体1的变量“ spell”显示为0,因为他没有找到工作(对于与他有关的所有观察结果,变量就业仍然等于零)。另一方面,由于个人2在第三次观察时找到工作(年份= 2002),因此个人2的拼写等于3,而个人3在第二次观察中找到工作(年份= 2000)。 有没有人建议做这样的事情?非常感谢您的宝贵时间。
答案 0 :(得分:0)
此代码段假设您的数据位于 df 中,并且id是从1开始的连续整数:
#assume your data is in df
df1 <- reshape(df, idvar="year", timevar="id", direction="wide")
pivoted <- subset(df1, select = -c(year))
m <- diff(as.matrix(pivoted))
m[is.na(m)] <- 0
df2 <- apply(m, 2, cummax)
df3 <- apply(df2,2, cumsum)
x <- df3[nrow(df3),]
y <- 1 + nrow(df1) - x
y[y == as.numeric(1+nrow(df1))] <- 0
# assign new column
df$spell <- y[df$id]
答案 1 :(得分:0)
这是一个base R
选项
transform(DF, spell = ave(
employment,
id,
FUN = function(x)
ifelse(all(x == 0), 0, which(cumsum(x) == 1))
))
# year id employment spell
#1 1998 1 0 0
#2 2000 1 0 0
#3 2002 1 0 0
#4 2004 1 0 0
#5 1998 2 0 3
#6 2000 2 0 3
#7 2002 2 1 3
#8 2004 2 1 3
#9 1998 3 0 2
#10 2000 3 1 2
#11 2002 3 1 2
#12 2004 3 1 2
基本思想是查找每1
组中第一个which(cumsum(x) == 1)
-id
的位置。但是,由于id == 1
组中没有任何人,因此我们需要ifelse
来处理这种情况。
数据
DF <- structure(list(year = c(1998L, 2000L, 2002L, 2004L, 1998L, 2000L,
2002L, 2004L, 1998L, 2000L, 2002L, 2004L), id = c(1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), employment = c(0L, 0L, 0L,
0L, 0L, 0L, 1L, 1L, 0L, 1L, 1L, 1L)), .Names = c("year", "id",
"employment"), class = "data.frame", row.names = c(NA, -12L))
答案 2 :(得分:0)
和另一个:)
# create data
rm(list = ls())
help = c(1998, 1, 0, 2000, 1, 0, 2002, 1, 0, 2004, 1, 0, 1998, 2, 0, 2000, 2, 0, 2002, 2, 1, 2004, 2, 1, 1998, 3, 0, 2000, 3, 1, 2002, 3, 1, 2004, 3, 1)
help = matrix(help, nrow = length(help)/3, ncol = 3, byrow = T)
data = data.frame(help)
names(data) = c("year", "id", "employment")
data
# create desired variable
help2 = tapply(data$employment, data$id , function(f) ifelse(sum(f == 1, na.rm = T) > 0, sum(f == 0, na.rm = T)+1, 0))
help2 = data.frame(help2)
help2$id = rownames(help2)
data = merge(data, help2, by = "id")
data