data.table shift返回错误的值

时间:2019-02-10 05:27:15

标签: r data.table data-manipulation lag

我的数据如下:

# DT
           ID STATE      NAME YEAR      POPULATION 
 1: 100325240    AL  FAIRHOPE 2007           16985 
 2: 100325240    AL  FAIRHOPE 2008           17134       
 3: 100325240    AL  FAIRHOPE 2009           16185   
 4: 100325240    AL  FAIRHOPE 2010           16409    
 5: 100325240    AL  FAIRHOPE 2011           16588       
 6: 100325240    AL  FAIRHOPE 2012           14184          
 7: 100325240    AL  FAIRHOPE 2013           16385            
 8: 100325240    AL  FAIRHOPE 2014           16794          
 9: 100325240    AL  FAIRHOPE 2015           18089           
10: 100524568    AL   EUFAULA 1996           13220
11: 100524568    AL   EUFAULA 1997           13220              
12: 100524568    AL   EUFAULA 1998           13220           
13: 100524568    AL   EUFAULA 1999           13220          
14: 100524568    AL   EUFAULA 2000           13220            
15: 100524568    AL   EUFAULA 2001           13908           
16: 100524568    AL   EUFAULA 2002           13908             
17: 100524568    AL   EUFAULA 2003           13908            
18: 100524568    AL   EUFAULA 2004           13908            
19: 100524568    AL   EUFAULA 2005           13908           
20: 100524568    AL   EUFAULA 2006           13463           

我要按组滞后数据。
经过shift的操纵之后,结果变得很奇怪。

library(data.table)
DT[, POPULATION_TM1 := shift(POPULATION, -1), by = .(ID, STATE, NAME)]

           ID STATE      NAME YEAR      POPULATION      POPULATION_TM1
 1: 100325240    AL  FAIRHOPE 2007           16985 9218868437227407266
 2: 100325240    AL  FAIRHOPE 2008           17134               16985
 3: 100325240    AL  FAIRHOPE 2009           16185               17134
 4: 100325240    AL  FAIRHOPE 2010           16409               16185
 5: 100325240    AL  FAIRHOPE 2011           16588               16409
 6: 100325240    AL  FAIRHOPE 2012           14184               16588
 7: 100325240    AL  FAIRHOPE 2013           16385               14184
 8: 100325240    AL  FAIRHOPE 2014           16794               16385
 9: 100325240    AL  FAIRHOPE 2015           18089               16794
10: 100524568    AL   EUFAULA 1996           13220 9218868437227407266
11: 100524568    AL   EUFAULA 1997           13220               13220
12: 100524568    AL   EUFAULA 1998           13220               13220
13: 100524568    AL   EUFAULA 1999           13220               13220
14: 100524568    AL   EUFAULA 2000           13220               13220
15: 100524568    AL   EUFAULA 2001           13908               13220
16: 100524568    AL   EUFAULA 2002           13908               13908
17: 100524568    AL   EUFAULA 2003           13908               13908
18: 100524568    AL   EUFAULA 2004           13908               13908
19: 100524568    AL   EUFAULA 2005           13908               13908
20: 100524568    AL   EUFAULA 2006           13463               13908

我不知道为什么是9218868437227407266而不是NA。我什至在数据中找不到9218868437227407266
我该如何解决?
DT中有36000多个观测值。

> sessioninfo::session_info()
- Session info ------------------------------------------------------------- ------------------------------------------
 setting  value                       
 version  R version 3.5.1 (2018-07-02)
 os       Windows >= 8 x64            
 system   x86_64, mingw32             
 ui       RStudio                     
 language (EN)                        
 collate  English_United States.1252  
 ctype    English_United States.1252  
 tz       America/New_York            
 date     2019-02-10                  

- Packages -----------------------------------------------------------------------------------------------------------
 package     * version date       lib source
 data.table  * 1.11.8  2018-09-30 [1] CRAN (R 3.5.1)


> dput(head(DT, 20))
structure(list(ID = structure(c(4.95672544947781e-316, 4.95672544947781e-316, 
4.95672544947781e-316, 4.95672544947781e-316, 4.95672544947781e-316, 
4.95672544947781e-316, 4.95672544947781e-316, 4.95672544947781e-316, 
4.95672544947781e-316, 4.96657356118323e-316, 4.96657356118323e-316, 
4.96657356118323e-316, 4.96657356118323e-316, 4.96657356118323e-316, 
4.96657356118323e-316, 4.96657356118323e-316, 4.96657356118323e-316, 
4.96657356118323e-316, 4.96657356118323e-316, 4.96657356118323e-316
), class = "integer64"), STATE = c("AL", "AL", "AL", "AL", "AL", 
"AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", 
"AL", "AL", "AL", "AL"), NAME = c("FAIRHOPE", "FAIRHOPE", 
"FAIRHOPE", "FAIRHOPE", "FAIRHOPE", "FAIRHOPE", "FAIRHOPE", "FAIRHOPE", 
"FAIRHOPE", "EUFAULA", "EUFAULA", "EUFAULA", "EUFAULA", "EUFAULA", 
"EUFAULA", "EUFAULA", "EUFAULA", "EUFAULA", "EUFAULA", "EUFAULA"
), YEAR = c(2007L, 2008L, 2009L, 2010L, 2011L, 2012L, 2013L, 
2014L, 2015L, 1996L, 1997L, 1998L, 1999L, 2000L, 2001L, 2002L, 
2003L, 2004L, 2005L, 2006L), POPULATION = structure(c(8.39170499461357e-320, 
8.46532077584392e-320, 7.99645247794058e-320, 8.10712318260901e-320, 
8.1955609332146e-320, 7.00782712061224e-320, 8.09526560710882e-320, 
8.29733845625789e-320, 8.93715346762231e-320, 6.53154783802128e-320, 
6.53154783802128e-320, 6.53154783802128e-320, 6.53154783802128e-320, 
6.53154783802128e-320, 6.87146500236006e-320, 6.87146500236006e-320, 
6.87146500236006e-320, 6.87146500236006e-320, 6.87146500236006e-320, 
6.6516057899607e-320), class = "integer64"), POPULATION_TM1 = 
structure(c(NA, 
8.39170499461357e-320, 8.46532077584392e-320, 7.99645247794058e-320, 
8.10712318260901e-320, 8.1955609332146e-320, 7.00782712061224e-320, 
8.09526560710882e-320, 8.29733845625789e-320, NA, 6.53154783802128e-320, 
6.53154783802128e-320, 6.53154783802128e-320, 6.53154783802128e-320, 
6.53154783802128e-320, 6.87146500236006e-320, 6.87146500236006e-320, 
6.87146500236006e-320, 6.87146500236006e-320, 6.87146500236006e-320
), class = "integer64")), class = c("data.table", "data.frame"
), row.names = c(NA, -20L), .internal.selfref = <pointer: 
0x0000000005e51ef0>)

更新Rdata.table的最新版本

> sessioninfo::session_info()
- Session info -------------------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.5.2 (2018-12-20)
 os       Windows >= 8 x64            
 system   x86_64, mingw32             
 ui       RStudio                     
 language (EN)                        
 collate  English_United States.1252  
 ctype    English_United States.1252  
 tz       America/New_York            
 date     2019-02-10                  

- Packages -----------------------------------------------------------------------------------------------------------
 package     * version date       lib source        
 assertthat    0.2.0   2017-04-11 [1] CRAN (R 3.5.1)
 cli           1.0.0   2017-11-05 [1] CRAN (R 3.5.1)
 crayon        1.3.4   2017-09-16 [1] CRAN (R 3.5.1)
 data.table  * 1.12.0  2019-01-13 [1] CRAN (R 3.5.2)
 rstudioapi    0.8     2018-10-02 [1] CRAN (R 3.5.1)
 sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 3.5.2)
 withr         2.1.2   2018-03-15 [1] CRAN (R 3.5.1)
 yaml          2.2.0   2018-07-25 [1] CRAN (R 3.5.2)

我下载了Rdata.table的最新版本后,导致 RSdutio R会话已停止工作A problem caused the program to stop working correctly. Please close the program.
如果我使用小的数据集,则可以使用。但是,有 36066 obs。 DT中的231个变量中。我不确定它是否会影响。但是,我只用功能POPULATION_TM1添加了一个新变量shift,而没有操纵其他变量。
我认为问题不是可变的。问题是36066 obs。删除其他变量并运行后,它仍会终止我的Rstudio。

0 个答案:

没有答案