我想有条件地重复一个变量的值。 例如,我有此data.frame
cod ano partido_prefeito
1 110001 1998 <NA>
2 110001 1999 <NA>
3 110001 2000 <NA>
4 110001 2001 PPB
5 110001 2002 <NA>
6 110001 2003 <NA>
7 110001 2004 <NA>
8 110001 2005 PDT
9 110001 2006 <NA>
10 110001 2007 <NA>
11 110001 2008 <NA>
12 110001 2009 PTN
13 110001 2010 <NA>
14 110001 2011 <NA>
15 110001 2012 <NA>
16 110001 2013 PMDB
17 110001 2014 <NA>
18 110001 2015 <NA>
19 110001 2016 <NA>
20 110002 1998 <NA>
对于变量“ partido_prefeito”,我有很多“ NA”观测值,但是,我想在接下来的3年中重复观测,直到观测值变化为止。变成这样的东西:
cod ano partido_prefeito
1 110001 1998 <NA>
2 110001 1999 <NA>
3 110001 2000 <NA>
4 110001 2001 PPB
5 110001 2002 PBP
6 110001 2003 PBP
7 110001 2004 PBP
8 110001 2005 PDT
9 110001 2006 PDT
10 110001 2007 PDT
11 110001 2008 PDT
12 110001 2009 PTN
13 110001 2010 PTN
14 110001 2011 PTN
15 110001 2012 PTN
16 110001 2013 PMDB
17 110001 2014 PMDB
18 110001 2015 PMBD
19 110001 2016 PMBD
20 110002 1998 <NA>
对于前3年:1998、1999、2000,数据仍为“ NA”。重要的细节是,我对不同的“鳕鱼”有很多观察。 我如何轻松进行此转换?
答案 0 :(得分:1)
使用fill
中的tidyverse
的完美时间。
首先,确保您的<NA>
值是实际的NA,而不是字符串。然后:
library(tidyverse)
data %>% group_by(cod) %>% fill(partido_prefeito)
fill
取最后一个值并将其填满。这项工作的唯一要求是您必须在数据中包含实际的NA
值。如果NA
以字符串形式存储,即"<NA>"
,则需要先将其转换为<NA>
。
1 110001 1998 <NA>
2 110001 1999 <NA>
3 110001 2000 <NA>
4 110001 2001 PPB
5 110001 2002 PPB
6 110001 2003 PPB
7 110001 2004 PPB
8 110001 2005 PDT
9 110001 2006 PDT
10 110001 2007 PDT
11 110001 2008 PDT
12 110001 2009 PTN
13 110001 2010 PTN
14 110001 2011 PTN
15 110001 2012 PTN
16 110001 2013 PMDB
17 110001 2014 PMDB
18 110001 2015 PMDB
19 110001 2016 PMDB
20 110002 1998 <NA>
data <- structure(list(cod = c(110001L, 110001L, 110001L, 110001L, 110001L,
110001L, 110001L, 110001L, 110001L, 110001L, 110001L, 110001L,
110001L, 110001L, 110001L, 110001L, 110001L, 110001L, 110001L,
110002L), ano = c(1998L, 1999L, 2000L, 2001L, 2002L, 2003L, 2004L,
2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 2011L, 2012L, 2013L,
2014L, 2015L, 2016L, 1998L), partido_prefeito = structure(c(NA,
NA, NA, 3L, NA, NA, NA, 1L, NA, NA, NA, 4L, NA, NA, NA, 2L, NA,
NA, NA, NA), .Label = c("PDT", "PMDB", "PPB", "PTN"), class = "factor")), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18", "19", "20"))