我正在与: -面板数据集 -10个时间段
如果虚拟变量RL
曾经是1 (TRUE)
一次,我需要创建一个永远等于RS
的虚拟变量1
。
换句话说:
新变量RL
(跨越10个周期)必须在t内为1
,如果在周期t-1中RS
是1
,则所有后续周期必须为t。如果TRUE
中没有发生RS
并且RS
为0 (FALSE)
,则RL
也应为0。
在时间段t中,TRUE
发生在RS
中,则RL
必须向前1
(在t + 1,t + 2,t + 3, t + 4 ...,面板的t + end)。
我的问题是FALSE
不能正确地读为0
,而只能读为NA
。
我使用了ifelse
,但是它给了我太多的空白:
df$r_1RL <- rep(0,nrow(df)) # is = 0 cause noone can retire in t-1 since "RS0" doesn't exists
df$r_2RL <- ifelse( df$r_1RS == 1, 1, ifelse(df$r_1RS == 0, 0, NA))
df$r_3RL <- ifelse( (df$r_1RS == 1 | df$r_2RS == 1), 1, ifelse( (df$r_1RS == 0 | df$r_2RS == 0), 0, NA))
df$r_4RL <- ifelse( (df$r_1RS == 1 | df$r_2RS == 1 | df$r_3RS == 1), 1, ifelse( (df$r_1RS == 0 | df$r_2RS == 0 | df$r_3RS == 0), 0, NA))
df$r_5RL <- ifelse( (df$r_1RS == 1 | df$r_2RS == 1 | df$r_3RS == 1 | df$r_4RS == 1 ), 1, ifelse( (df$r_1RS == 0 | df$r_2RS == 0 | df$r_3RS == 0 | df$r_4RS == 0), 0, NA))
and so on... up to 10RL
df <- structure(list(r_1RS = c(FALSE, FALSE, FALSE, FALSE, FALSE, NA
), r_2RS = c(FALSE, NA, FALSE, FALSE, FALSE, NA), r_3RS = c(FALSE,
FALSE, FALSE, FALSE, FALSE, NA), r_4RS = c(FALSE, FALSE, FALSE,
FALSE, NA, FALSE), r_5RS = c(FALSE, TRUE, FALSE, FALSE, NA, FALSE
), r_6RS = c(FALSE, FALSE, FALSE, FALSE, NA, TRUE), r_7RS = c(FALSE,
FALSE, FALSE, FALSE, NA, FALSE), r_8RS = c(TRUE, FALSE, FALSE,
FALSE, FALSE, FALSE), r_9RS = c(FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE), r_10RS = c(FALSE, FALSE, TRUE, FALSE, NA, FALSE), r_1RL = c(0,
0, 0, 0, 0, 0), r_2RL = c(0, 0, 0, 0, 0, NA), r_3RL = c(0, NA,
0, 0, 0, NA), r_4RL = c(0, NA, 0, 0, 0, NA), r_5RL = c(0, NA,
0, 0, NA, NA), r_6RL = c(0, 1, 0, 0, NA, NA), r_7RL = c(0, 1,
0, 0, NA, 1), r_8RL = c(0, 1, 0, 0, NA, 1), r_9RL = c(1, 1, 0,
0, NA, 1), r_10RL = c(1, 1, 0, 0, NA, 1)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
在这里您可以看到RS
中的真实情况如何发生,而RL
之后是1
。但是有两个问题。.首先,r_10RL中的1
应该是NA
,而r_7RL应该具有0
,而不是 {{1} }
带圆圈的NA's
应该为0,带圆圈的NA
应该为1
答案 0 :(得分:1)
这感觉很骇人,我不喜欢它,但是它适用于您的示例数据。您可能会采纳总体思路并使之更有效。让我知道您是否遇到任何问题!
# Using the first 10 columns of your dput dataframe
df <- df[1:10]
> df
# A tibble: 6 x 10
r_1RS r_2RS r_3RS r_4RS r_5RS r_6RS r_7RS r_8RS r_9RS r_10RS
<lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl>
1 FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
2 FALSE NA FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
3 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
4 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
5 FALSE FALSE FALSE NA NA NA NA FALSE FALSE NA
6 NA NA NA FALSE FALSE TRUE FALSE FALSE FALSE FALSE
# Createing a copy for the new columns
df2 <- df
# There may be other ways to handle NA's but you mentioend you want them
# as zero so this should work for you
df2[is.na(df2)] <- 0
# Changing all values after TRUE to 1
df2 <- data.frame(t(apply(df2, 1, function(x) as.numeric(cumsum(x) > 0))))
# Chaning the names
names(df2) <- sub("RS", "RL", names(df), fixed = T)
# Combining the columns
> cbind(df, df2)
r_1RS r_2RS r_3RS r_4RS r_5RS r_6RS r_7RS r_8RS r_9RS r_10RS r_1RL r_2RL r_3RL r_4RL r_5RL r_6RL r_7RL r_8RL r_9RL r_10RL
1 FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE 0 0 0 0 0 0 0 1 1 1
2 FALSE NA FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE 0 0 0 0 1 1 1 1 1 1
3 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE 0 0 0 0 0 0 0 0 0 1
4 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 0 0 0 0 0 0 0 0 0 0
5 FALSE FALSE FALSE NA NA NA NA FALSE FALSE NA 0 0 0 0 0 0 0 0 0 0
6 NA NA NA FALSE FALSE TRUE FALSE FALSE FALSE FALSE 0 0 0 0 0 1 1 1 1 1
编辑:
只需阅读文章的最后几行。如果要在新列中保留NA,只需将df2[is.na(df)] <- NA
放在cbind
之前。我不清楚您到底想要什么,因此,如果不是您想要的,您是否可以发布一个数据框,其中包含所需的示例数据输出?如果遇到其他问题,请发表评论或发表更新!
EDIT2:
完成步骤的另一种方法涉及apply
(这可能很慢)。我无法测试哪种方法更快,所以我想同时包括这两种方法:
# Changing all values after TRUE to 1
df2[] <- lapply(df2, as.numeric)
df2_t <- data.frame(t(df2))
> data.frame(t(cumsum(df2_t) > 0)*1)
r_1RS r_2RS r_3RS r_4RS r_5RS r_6RS r_7RS r_8RS r_9RS r_10RS
X1 0 0 0 0 0 0 0 1 1 1
X2 0 0 0 0 1 1 1 1 1 1
X3 0 0 0 0 0 0 0 0 0 1
X4 0 0 0 0 0 0 0 0 0 0
X5 0 0 0 0 0 0 0 0 0 0
X6 0 0 0 0 0 1 1 1 1 1