我有一个dataframe / tibble,包含几个国家/地区的年度观察结果。在特定事件发生的年份中,变量event
获得值1.
我现在正在尝试指定一个新列event.10yrs
,它在事件结束后的9年内获得值1(如果事件持续数年,则为事件的去年)。在新事件发生且不是新事件的最后一年的年份中,新列event.10yrs
获得值0.
单个国家/地区的数据下方。列event.10yrs
是所需的输出。
df <-structure(list(year = c(1970, 1971, 1972, 1973, 1974, 1975, 1976,
1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987,
1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998,
1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009,
2010, 2011, 2012, 2013, 2014, 2015), ccode = c(516, 516, 516,
516, 516, 516, 516, 516, 516, 516, 516, 516, 516, 516, 516, 516,
516, 516, 516, 516, 516, 516, 516, 516, 516, 516, 516, 516, 516,
516, 516, 516, 516, 516, 516, 516, 516, 516, 516, 516, 516, 516,
516, 516, 516, 516), event = c(0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1,
1, 1, 1, 0, 0, 0, 0, 0, 0, 0, NA, NA, NA, NA, NA), event.last.y = c(0,
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, NA,
NA, NA, NA, NA), event.10yrs = c(NA, 0, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,
0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, NA, NA, NA)), row.names = c(NA,
-46L), vars = "ccode", drop = TRUE, class = c("grouped_df", "tbl_df",
"tbl", "data.frame"), indices = list(0:45), group_sizes = 46L, biggest_group_size = 46L, labels = structure(list(
ccode = 516), row.names = c(NA, -1L), vars = "ccode", drop = TRUE, class = "data.frame", .Names = "ccode"), .Names = c("year",
"ccode", "event", "event.last.y", "event.10yrs"))
到目前为止,我尝试使用dplyr包:
df <- df %>%
mutate(event.10yrs=case_when(event!=1 & year-9 < year[event.last.y==1] ~ 1,
TRUE ~ 0))
然而,这会产生以下警告:
Warning message:
In year < year[rs.war.last.y == 1] :
longer object length is not a multiple of shorter object length
感谢任何提示。
答案 0 :(得分:1)
也许只是嵌套的ifelse(或dplyr :: if_else)
require(dplyr)
df %>% mutate(ev_10 = if_else(event == 0, 1,
if_else(event.last.y ==1, 1, 0),
0))
修改强>
这篇文章在这里帮助了我:Find the index position of the first non-NA value in an R vector?
但我们不仅要替换首次出现的&#39; x&#39; ...
所以我用一个辅助列
index_1 <- unlist(lapply(which(df$event.last.y ==1 ),
function(x) seq(x, length.out=9)))
# this makes a vector with all the index of the last 9 positions
# after the last value == 1
df$last_code <- df$event.last.y #just to duplicate your column
df$last_code[index_1] <- 1 #replacing the indices with '1'
现在我们可以像以前一样使用简单的嵌套条件语句
df <- df %>% mutate(ev_10 = if_else(event == 0 & last_code==1, 1,
#added the condition that last_code needs to be '1'
if_else(event.last.y ==1, 1, 0),
0))
head(df[c(2:13, 31:40),], 20) #printing only example rows here
# A tibble: 20 x 7
# Groups: ccode [1]
year ccode event event.last.y event.10yrs last_code ev_10
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1971 516 0 0 0 0 0
2 1972 516 1.00 1.00 1.00 1.00 1.00
3 1973 516 0 0 1.00 1.00 1.00
4 1974 516 0 0 1.00 1.00 1.00
5 1975 516 0 0 1.00 1.00 1.00
6 1976 516 0 0 1.00 1.00 1.00
7 1977 516 0 0 1.00 1.00 1.00
8 1978 516 0 0 1.00 1.00 1.00
9 1979 516 0 0 1.00 1.00 1.00
10 1980 516 0 0 1.00 1.00 1.00
11 1981 516 0 0 1.00 0 0
12 1982 516 0 0 0 0 0
...
13 2000 516 1.00 0 0 1.00 0
14 2001 516 1.00 0 0 1.00 0
15 2002 516 1.00 0 0 1.00 0
16 2003 516 1.00 1.00 1.00 1.00 1.00
17 2004 516 0 0 1.00 1.00 1.00
18 2005 516 0 0 1.00 1.00 1.00
19 2006 516 0 0 1.00 1.00 1.00
20 2007 516 0 0 1.00 1.00 1.00