我想要一份年度国家特定假人的清单,我想在标记的年份之前标记两年。
数据看起来像这样
library(tidyverse)
df <- tribble(
~year, ~country, ~occurrence,
#--|--|----
2003, "USA", 1,
2004, "USA", 0,
2005, "USA", 0,
2006, "USA", 0,
2007, "USA", 0,
2008, "USA", 0,
2009, "USA", 0,
2010, "USA", 0,
2011, "USA", 1,
2012, "USA", 0,
2013, "USA", 0,
2005, "FRA", 0,
2006, "FRA", 0,
2007, "FRA", 1,
2008, "FRA", 1,
2009, "FRA", 0,
2010, "FRA", 0,
2011, "FRA", 0,
2012, "FRA", 0,
2013, "FRA", 0,
2014, "FRA", 0,
2015, "FRA", 1
)
因此,对于"USA"
,我还希望将1
列入2009年和2010年的occurence
列以及2005年,2006年,2013年和2014年的FRA
年
我想过做这样的事情:
df %>%
group_by(country) %>%
mutate(occurence = ifelse("not sure what to put here"),
1,
0))
但我不知道如何告诉R只过滤我想要的年份。
答案 0 :(得分:2)
按“国家/地区”分组后,我们最多可以使用lead
'发生'并获取max
的每一行pmax
以获得'发生时'的预期输出“
df %>%
group_by(country) %>%
mutate(occurrence = pmax(occurrence, lead(occurrence, default = 0),
lead(occurrence, default=0, n=2)))
或者可以使用类似方法的data.table
实现这一目标
library(data.table)
setDT(df)[, occurrence := do.call(pmax, shift(occurrence, n = 0:2,
type = "lead", fill = 0)), country]
df
# year country occurrence
# 1: 2003 USA 1
# 2: 2004 USA 0
# 3: 2005 USA 0
# 4: 2006 USA 0
# 5: 2007 USA 0
# 6: 2008 USA 0
# 7: 2009 USA 1
# 8: 2010 USA 1
# 9: 2011 USA 1
#10: 2012 USA 0
#11: 2013 USA 0
#12: 2005 FRA 1
#13: 2006 FRA 1
#14: 2007 FRA 1
#15: 2008 FRA 1
#16: 2009 FRA 0
#17: 2010 FRA 0
#18: 2011 FRA 0
#19: 2012 FRA 0
#20: 2013 FRA 1
#21: 2014 FRA 1
#22: 2015 FRA 1
答案 1 :(得分:1)
这是另一个dplyr解决方案:
df %>%
group_by(country) %>%
mutate(
occurrence=ifelse( lead(occurrence, 1) %in% 1 |
lead(occurrence, 2) %in% 1,
1, occurrence)
)
# A tibble: 22 x 3
# Groups: country [2]
year country occurrence
<dbl> <chr> <dbl>
1 2003 USA 1
2 2004 USA 0
3 2005 USA 0
4 2006 USA 0
5 2007 USA 0
6 2008 USA 0
7 2009 USA 1
8 2010 USA 1
9 2011 USA 1
10 2012 USA 0
11 2013 USA 0
12 2005 FRA 1
13 2006 FRA 1
14 2007 FRA 1
15 2008 FRA 1
16 2009 FRA 0
17 2010 FRA 0
18 2011 FRA 0
19 2012 FRA 0
20 2013 FRA 1
21 2014 FRA 1
22 2015 FRA 1
使用 lead(occurrence, 1) %in% 1
代替lead(occurrence, 1) == 1
,因为后者无法处理NA
。