如何生成捕获先前值的滞后变量(内生滞后)?

时间:2018-06-18 09:48:55

标签: r function dplyr data.table

我想生成以下内生滞后(Y)变量

set Y=1 in the current routine year, if submission==1 and routineyear==1 in the previous routine year

set Y=2 in the current routine year, if sub==0 and routineyear==1 in the previous routine year

Otherwise=0 

注意那个"以前的常规年份"不是前一年,例行年份之间的间隔不同。这实际上是让我很难生成这个变量的原因。

基本上,我想生成一个内生变量,用于捕获状态在最后routineyear的行为。

说明我想做的事情:

假设A国的常规年份为 1990 - 同一年submission变量也为=1。这将生成Y=1

现在,国家/地区的下一个routineyear位于 1992 ,即该年的submission=1routineyear=1。内在滞后应表明A的先前行为,如 1990 Y=1)。

然后,下一个routineyear位于 1996 ,其中submission=0routineyear=1。在这种情况下,内生滞后将是A 1992 Y=1)中之前行为的值。

然后,下一个routineyear位于 1998 where submission=1routineyear=1。此处的内生滞后应表明A在 1996 中的最后一个routineyear中的先前行为。那是:Y=2

内源滞后应该是这样的(基于上面的例子)

country year     submission routineyear  Y(endo lag)
A       1990          1            1     1  
A       1991          0            0     0
A       1992          1            1     1 
A       1993          1            0     0
A       1994          0            0     0
A       1995          0            0     0
A       1996          0            1     1
A       1997          0            0     0
A       1998          1            1     2
A       1999          0            0     0
A       2000          0            0     0
A       2001          0            1     1
A       2002          0            0     0
A       2003          1            1     2

我一直试图使用不同的逻辑来做到这一点,但没有成功。其中一个最大的问题是每个国家的常规年份不同,间隔时间不稳定。

我相信能够在R中编写正确的代码/功能的人能够解决这个难题。如果没有,我将感谢所有建议如何从这里开始。

来自我的真实数据的样本:

  

结构(列表(ccode = c(31L,31L,31L,31L,31L,31L,31L,31L,31L,   31L,31L,31L,31L,31L,31L,31L,31L,31L,31L,31L,31L,31L,40L,   40L,40L,40L,40L,40L,40L,40L,40L,40L,40L,40L,40L,40L,40L,   40L,40L,40L,40L,40L,40L,40L,41L,41L,41L,41L,41L,41L,41L,   41L,41L,41L,41L,41L,41L,41L,41L,41L,41L,41L,41L,41L,41L,   41L,42L,42L,42L,42L,42L,42L,42L,42L,42L,42L,42L,42L,42L,   42L,42L,42L,42L,42L,42L,42L,42L,42L,51L,51L,51L,51L,51L,   51L,51L,51L,51L,51L,51L,51L,51L,51L,51L,51L,51L,51L,51L,   51L,51L,51L,51L,52L,52L,52L,52L,52L,52L,52L,52L,52L,52L,   52L,52L,52L,52L,52L,52L,52L,52L,52L,52L,52L,52L,53L,53L,   53L,53L,53L,53L,53L,53L,53L,53L,53L,53L,53L,53L,53L,53L,   53L,53L,53L,53L,53L,53L,54L,54L,54L,54L,54L,54L,54L,54L,   54L,54L,54L,54L,54L,54L,54L,54L,54L,54L,54L,54L,54L,54L,   70L,70L,70L,70L,70L,70L,70L,70L,70L,70L,70L,70L,70L,70L,   70L,70L,70L,70L,70L,70L,70L,70L,80L,80L,80L,80L,80L,80L,   80L,80L,80L,80L,80L,80L,80L,80L,80L,80L,80L,80L,80L,80L,   80L,80L,90L,90L,90L,90L,90L,90L,90L,90L,90L,90L,90L,90L,   90L,90L,90L,90L,90L,90L,90L,90L,90L,90L),年份= c(1990L,   1991L,1992L,1993L,1994L,1995L,1996L,1997L,1998L,1999L,2000L,   2001L,2002L,2003L,2004L,2005L,2006L,2007L,2008L,2009L,2010L,   2011L,1990L,1991L,1992L,1993L,1994L,1995L,1996L,1997L,1998L,   1999L,2000L,2001L,2002L,2003L,2004L,2005L,2006L,2007L,2008L,   2009L,2010L,2011L,1990L,1991L,1992L,1993L,1994L,1995L,1996L,   1997L,1998L,1999L,2000L,2001L,2002L,2003L,2004L,2005L,2006L,   2007L,2008L,2009L,2010L,2011L,1990L,1991L,1992L,1993L,1994L,   1995L,1996L,1997L,1998L,1999L,2000L,2001L,2002L,2003L,2004L,   2005L,2006L,2007L,2008L,2009L,2010L,2011L,1990L,1991L,1992L,   1993L,1994L,1995L,1996L,1997L,1998L,1999L,1999L,2000L,2001L,   2002L,2003L,2004L,2005L,2006L,2007L,2008L,2009L,2010L,2011L,   1990L,1991L,1992L,1993L,1994L,1995L,1996L,1997L,1998L,1999L,   2000L,2001L,2002L,2003L,2004L,2005L,2006L,2007L,2008L,2009L,   2010L,2011L,1990L,1991L,1992L,1993L,1994L,1995L,1996L,1997L,   1998L,1999L,2000L,2001L,2002L,2003L,2004L,2005L,2006L,2007L,   2008L,2009L,2010L,2011L,1990L,1991L,1992L,1993L,1994L,1995L,   1996L,1997L,1998L,1999L,2000L,2001L,2002L,2003L,2004L,2005L,   2006L,2007L,2008L,2009L,2010L,2011L,1990L,1991L,1992L,1993L,   1994L,1995L,1996L,1997L,1998L,1999L,2000L,2001L,2002L,2003L,   2004L,2005L,2006L,2007L,2008L,2009L,2010L,2011L,1990L,1991L,   1992L,1993L,1994L,1995L,1996L,1997L,1998L,1999L,2000L,2001L,   2002L,2003L,2004L,2005L,2006L,2007L,2008L,2009L,2010L,2011L,   1990L,1991L,1992L,1993L,1994L,1995L,1996L,1997L,1998L,1999L,   2000L,2001L,2002L,2003L,2004L,2005L,2006L,2007L,2008L,2009L,   2010L,2011L),country = structure(c(1L,1L,1L,1L,1L,1L,1L,1L,   1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,4L,4L,4L,   4L,4L,4L,4L,4L,4L,4L,4L,4L,4L,4L,4L,4L,4L,4L,4L,4L,   4L,4L,8L,8L,8L,8L,8L,8L,8L,8L,8L,8L,8L,8L,8L,8L,8L,   8L,8L,8L,8L,8L,8L,8L,6L,6L,6L,6L,6L,6L,6L,6L,6L,6L,   6L,6L,6L,6L,6L,6L,6L,6L,6L,6L,6L,6L,9L,9L,9L,9L,9L,   9L,9L,9L,9L,9L,9L,9L,9L,9L,9L,9L,9L,9L,9L,9L,9L,9L,   9L,11L,11L,11L,11L,11L,11L,11L,11L,11L,11L,11L,11L,11L,   11L,11L,11L,11L,11L,11L,11L,11L,11L,2L,2L,2L,2L,2L,2L,   2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,5L,   5L,5L,5L,5L,5L,5L,5L,5L,5L,5L,5L,5L,5L,5L,5L,5L,5L,   5L,5L,5L,5L,10L,10L,10L,10L,10L,10L,10L,10L,10L,10L,   10L,10L,10L,10L,10L,10L,10L,10L,10L,10L,10L,10L,3L,3L,   3L,3L,3L,3L,3L,3L,3L,3L,3L,3L,3L,3L,3L,3L,3L,3L,3L,   3L,3L,3L,7L,7L,7L,7L,7L,7L,7L,7L,7L,7L,7L,7L,7L,7L,   7L,7L,7L,7L,7L,7L,7L,7L),。标签= c("巴哈马","巴巴多斯",   " Belize"," Cuba"," Dominica"," Dominican Republic"," Guatemala",   " Haiti"," Jamaica"," Mexico"," Trinidad and Tobago"),class =   "因子&#34),       submission = c(1L,0L,0L,0L,0L,1L,0L,1L,0L,1L,0L,       1L,0L,1L,0L,1L,0L,0L,0L,1L,0L,1L,1L,0L,1L,0L,       1L,0L,0L,1L,0L,1L,0L,1L,0L,1L,0L,1L,0L,1L,0L,       1L,0L,1L,1L,0L,0L,1L,0L,0L,0L,1L,0L,0L,1L,0L,       0L,0L,0L,0L,1L,0L,1L,1L,0L,0L,0L,0L,1L,0L,0L,       0L,0L,1L,0L,1L,0L,1L,0L,1L,0L,1L,1L,1L,0L,1L,       0L,0L,1L,0L,1L,0L,0L,0L,0L,0L,1L,1L,1L,0L,0L,       1L,1L,0L,1L,0L,1L,0L,1L,0L,0L,1L,0L,0L,0L,1L,       0L,0L,1L,0L,1L,0L,1L,0L,0L,0L,1L,0L,1L,1L,0L,       1L,0L,1L,0L,0L,1L,0L,1L,0L,0L,1L,1L,0L,0L,1L,       0L,0L,0L,1L,0L,0L,1L,0L,0L,1L,0L,0L,0L,0L,0L,       1L,1L,0L,0L,1L,1L,0L,1L,0L,0L,1L,0L,1L,0L,0L,       0L,1L,0L,1L,0L,1L,0L,0L,1L,0L,1L,0L,1L,0L,1L,       1L,0L,1L,0L,1L,0L,1L,0L,1L,0L,0L,0L,0L,0L,0L,       1L,0L,0L,1L,0L,0L,1L,0L,0L,1L,0L,1L,0L,1L,1L,       0L,0L,1L,0L,0L,0L,1L,1L,0L,1L,1L,0L,1L,1L,0L,       1L,0L,1L,0L,1L,0L,0L),routineyear = c(1L,0L,0L,       1L,0L,0L,0L,1L,0L,1L,0L,1L,0L,1L,0L,1L,0L,1L,       0L,0L,0L,1L,0L,0L,1L,0L,1L,0L,0L,1L,0L,1L,0L,       1L,0L,1L,0L,1L,0L,1L,0L,1L,0L,1L,0L,0L,1L,0L,       0L,0L,0L,0L,0L,1L,0L,0L,0L,0L,0L,0L,0L,0L,0L,       0L,0L,1L,0L,0L,1L,0L,0L,0L,0L,1L,0L,1L,0L,1L,       0L,1L,0L,1L,0L,0L,0L,1L,0L,0L,1L,0L,1L,0L,1L,       0L,0L,1L,0L,0L,0L,0L,1L,0L,0L,0L,1L,0L,1L,0L,       0L,0L,1L,0L,0L,1L,0L,0L,0L,0L,1L,0L,1L,0L,1L,       0L,1L,0L,0L,0L,0L,0L,0L,1L,0L,1L,0L,1L,0L,0L,       0L,0L,1L,0L,0L,0L,1L,0L,0L,0L,0L,0L,0L,0L,0L,       0L,1L,0L,0L,1L,0L,0L,0L,0L,0L,0L,1L,0L,0L,0L,       1L,0L,1L,0L,0L,0L,0L,0L,1L,0L,0L,1L,0L,1L,0L,       0L,1L,0L,1L,0L,1L,0L,1L,0L,0L,1L,0L,1L,0L,1L,       0L,1L,0L,0L,0L,1L,0L,0L,1L,0L,1L,0L,0L,0L,0L,       0L,1L,0L,0L,1L,0L,0L,0L,0L,0L,1L,0L,1L,0L,0L,       0L,0L,1L,0L,0L,0L,0L,0L,1L,0L,1L,0L,1L,0L,0L       )),。姓名= c(" ccode","年","国家/#34;,"提交","例行年&# 34;),class =" data.frame",row.names = c(NA,-243L))

2 个答案:

答案 0 :(得分:2)

使用

library(data.table)
setDT(DF)

DF[, Y := 0
   ][routineyear == 1
     , Y := 1 + (shift(submission, fill = 1) == 0)
     , by = country][]

给出(显示前15行):

> DF
    ccode year country submission routineyear Y
 1:    31 1990 Bahamas          1           1 1
 2:    31 1991 Bahamas          0           0 0
 3:    31 1992 Bahamas          0           0 0
 4:    31 1993 Bahamas          0           1 1
 5:    31 1994 Bahamas          0           0 0
 6:    31 1995 Bahamas          1           0 0
 7:    31 1996 Bahamas          0           0 0
 8:    31 1997 Bahamas          1           1 2
 9:    31 1998 Bahamas          0           0 0
10:    31 1999 Bahamas          1           1 1
11:    31 2000 Bahamas          0           0 0
12:    31 2001 Bahamas          1           1 1
13:    31 2002 Bahamas          0           0 0
14:    31 2003 Bahamas          1           1 1
15:    31 2004 Bahamas          0           0 0
........

这是做什么的:

  • setDT(DF)将您的数据框转换为data.table
  • Y := 0首先通过引用将Y设置为0
  • 过滤routineyear == 1
  • 通过引用更新Y,以便Y设置为1,前一次提交为1,而2上次提交时为0 models.User: id = pk username = text models.Offer id = pk description = text publicationDate = Date user = Fk(User) my serializer is: class UserOfferSerializer(ModelSerializer): offers = OfferSerializerAll(many=True, read_only=True) class Meta: model = User fields = ('id', 'username', 'offers') }

答案 1 :(得分:1)

library(dplyr)

select(dat2, -Y) %>% 
  filter(routineyear == 1L) %>% 
  group_by(country) %>% 
  mutate(Y = 2L - lag(submission, default = 1L)) %>% 
  ungroup() %>% 
  right_join(select(dat2, -Y)) %>% 
  mutate(Y = replace(Y, is.na(Y), 0L))

# # A tibble: 14 x 5
#    country  year submission routineyear     Y
#    <fct>   <int>      <int>       <int> <int>
#  1 A        1990          1           1     1
#  2 A        1991          0           0     0
#  3 A        1992          1           1     1
#  4 A        1993          1           0     0
#  5 A        1994          0           0     0
#  6 A        1995          0           0     0
#  7 A        1996          0           1     1
#  8 A        1997          0           0     0
#  9 A        1998          1           1     2
# 10 A        1999          0           0     0
# 11 A        2000          0           0     0
# 12 A        2001          0           1     1
# 13 A        2002          0           0     0
# 14 A        2003          1           1     2

all.equal(.Last.value, dat2)
# [1] TRUE

其中dat2是:

dat2 <- read.table(text = 
"country year     submission routineyear  Y
A       1990          1            1     1  
A       1991          0            0     0
A       1992          1            1     1 
A       1993          1            0     0
A       1994          0            0     0
A       1995          0            0     0
A       1996          0            1     1
A       1997          0            0     0
A       1998          1            1     2
A       1999          0            0     0
A       2000          0            0     0
A       2001          0            1     1
A       2002          0            0     0
A       2003          1            1     2
", header = TRUE)