来自data.table的怪异移位/滞后结果

时间:2019-04-24 14:41:40

标签: r data.table

请考虑该表a,其中包含人员编号,日期(以年为单位)和数据。

a = data.table(person = c(1,1,1,2,3,3,3,4,4,5,5,5,5,5), date = c(2010,2011,2012,2010,2010,2011,2012,2010,2011,2010,2011,2012,2013,2014), data = c(9,7,6,4,3,3,5,1,6,5,7,8,4,9))

我想按人转移“日期”,所以我这样做:

a <- a[order(date)]
a[, date := shift(date, 1L, type = "lag"), by=.(person)]

    person date data
 1:      1   NA    9
 2:      2   NA    4
 3:      3   NA    3
 4:      4   NA    1
 5:      5   NA    5
 6:      1 2010    7
 7:      3 2010    3
 8:      4 2010    6
 9:      5 2010    7
10:      1 2011    6
11:      3 2011    5
12:      5 2011    8
13:      5 2012    4
14:      5 2013    9

这是正确的,但是当我想再次输入相同的代码来换一年(我想结果就像将日期移了2个滞后):

a <- a[order(date)]
a[, date := shift(date, 1L, type = "lag"), by=.(person)]

人们期望与人5约会的日期2013,与人4约会的日期2010,与人3约会的日期2011,与人1约会的日期2011。这是愿望(正确)结果

   person date data
 1:      5 2010    9
 2:      1 2010    7
 3:      3 2010    3
 4:      5 2011    5
 5:      5 2012    7
 6:      1   NA    6
 7:      3   NA    5
 8:      5   NA    8
 9:      4   NA    1
10:      5   NA    4
11:      1   NA    9
12:      3   NA    3
13:      4   NA    6
14:      2   NA    4

再次执行移位操作的奇怪输出给出:

    person date data
 1:      1 2010    6
 2:      3 2010    5
 3:      5 2010    8
 4:      4 2010    1
 5:      5 2011    4
 6:      1 2011    9
 7:      3 2011    3
 8:      5 2012    9
 9:      5 2013    5
10:      1   NA    7
11:      3   NA    3
12:      4   NA    6
13:      5   NA    7
14:      2   NA    4

似乎是在回收观察?

1 个答案:

答案 0 :(得分:1)

删除第二次重新分配和order通话。 order(date)NA的值放在末尾。 shift只是一个向量,并且由于NA值现在位于末尾,因此它们被shift取代,而不是您期望的date值:

或者,在您的order调用中,您可以使用na.last参数,即a <- a[order(date, na.last = FALSE)]

library(data.table)
#> Warning: package 'data.table' was built under R version 3.4.4
a = data.table(person = c(1,1,1,2,3,3,3,4,4,5,5,5,5,5), date = c(2010,2011,2012,2010,2010,2011,2012,2010,2011,2010,2011,2012,2013,2014), data = c(9,7,6,4,3,3,5,1,6,5,7,8,4,9))

a <- a[order(date)]
a[, date := shift(date, 1L, type = "lag"), by=.(person)]
a[]
#>     person date data
#>  1:      1   NA    9
#>  2:      2   NA    4
#>  3:      3   NA    3
#>  4:      4   NA    1
#>  5:      5   NA    5
#>  6:      1 2010    7
#>  7:      3 2010    3
#>  8:      4 2010    6
#>  9:      5 2010    7
#> 10:      1 2011    6
#> 11:      3 2011    5
#> 12:      5 2011    8
#> 13:      5 2012    4
#> 14:      5 2013    9

# Note I'm not reassigning here, just showing for demonstrative purposes
# Notice NA placement
a[order(date), ] 
#>     person date data
#>  1:      1 2010    7
#>  2:      3 2010    3
#>  3:      4 2010    6
#>  4:      5 2010    7
#>  5:      1 2011    6
#>  6:      3 2011    5
#>  7:      5 2011    8
#>  8:      5 2012    4
#>  9:      5 2013    9
#> 10:      1   NA    9
#> 11:      2   NA    4
#> 12:      3   NA    3
#> 13:      4   NA    1
#> 14:      5   NA    5

# what you expect to see
a[, date := shift(date, 1L, type = "lag"), by=.(person)]

a[]
#>     person date data
#>  1:      1   NA    9
#>  2:      2   NA    4
#>  3:      3   NA    3
#>  4:      4   NA    1
#>  5:      5   NA    5
#>  6:      1   NA    7
#>  7:      3   NA    3
#>  8:      4   NA    6
#>  9:      5   NA    7
#> 10:      1 2010    6
#> 11:      3 2010    5
#> 12:      5 2010    8
#> 13:      5 2011    4
#> 14:      5 2012    9

reprex package(v0.2.1)于2019-04-24创建