我有以下df:
df <- tibble(country = c("US", "US", "US", "US", "US", "US", "US", "US", "US", "Mex", "Mex"),
year = c(1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2000, 2001),
score = c(NA, NA, NA, NA, 426, NA, NA, 430, NA, 450, NA))
我想做的是:创建一个新变量before_after
,直到一个国家对score
具有非NA值的第一年之前该变量为0,之后为1。
换句话说,对它进行硬编码,我希望它返回以下df:
df <- tibble(country = c("US", "US", "US", "US", "US", "US", "US", "US", "US", "Mex", "Mex"),
year = c(1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2000, 2001),
score = c(NA, NA, NA, NA, 426, NA, NA, 430, NA, 450, NA),
before_after = c(0,0,0,0,1,1,1,1,1,1,1))
我尝试了以下代码,但无济于事:
df %>%
arrange(year) %>%
group_by(country) %>%
mutate(before_after = ifelse(which.max(!is.na(score)),1,0)) %>%
arrange(country, year)
Tidyverse解决方案将不胜感激,但实际上,任何帮助将不胜感激。
谢谢!
答案 0 :(得分:2)
您可以使用cumsum
df %>%
arrange(country, year) %>%
group_by(country) %>%
mutate(before_after = ifelse(cumsum(!is.na(score)) > 0, 1, 0))
country year score before_after
<chr> <dbl> <dbl> <dbl>
1 Mex 2000 450 1
2 Mex 2001 NA 1
3 US 1999 NA 0
4 US 2000 NA 0
5 US 2001 NA 0
6 US 2002 NA 0
7 US 2003 426 1
8 US 2004 NA 1
9 US 2005 NA 1
10 US 2006 430 1
11 US 2007 NA 1
答案 1 :(得分:0)
将group_by
与fill
结合使用:
library(tidyverse)
# create dataframe
df <- tibble(country = c("US", "US", "US", "US", "US", "US", "US", "US", "US", "Mex", "Mex"),
year = c(1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2000, 2001),
score = c(NA, NA, NA, NA, 426, NA, NA, 430, NA, 450, NA))
# create before_after variable with case_when
(df <- mutate(df, before_after = case_when(!is.na(score) ~ 1)))
# A tibble: 11 x 4
country year score before_after
<chr> <dbl> <dbl> <dbl>
1 Mex 2000 450 1
2 Mex 2001 NA NA
3 US 1999 NA NA
4 US 2000 NA NA
5 US 2001 NA NA
# run fill
df %>%
group_by(country) %>%
fill(before_after)
# A tibble: 11 x 4
# Groups: country [2]
country year score before_after
<chr> <dbl> <dbl> <dbl>
1 Mex 2000 450 1
2 Mex 2001 NA 1
3 US 1999 NA NA
4 US 2000 NA NA
5 US 2001 NA NA