我的数据如下:
# DT
ID STATE NAME YEAR POPULATION
1: 100325240 AL FAIRHOPE 2007 16985
2: 100325240 AL FAIRHOPE 2008 17134
3: 100325240 AL FAIRHOPE 2009 16185
4: 100325240 AL FAIRHOPE 2010 16409
5: 100325240 AL FAIRHOPE 2011 16588
6: 100325240 AL FAIRHOPE 2012 14184
7: 100325240 AL FAIRHOPE 2013 16385
8: 100325240 AL FAIRHOPE 2014 16794
9: 100325240 AL FAIRHOPE 2015 18089
10: 100524568 AL EUFAULA 1996 13220
11: 100524568 AL EUFAULA 1997 13220
12: 100524568 AL EUFAULA 1998 13220
13: 100524568 AL EUFAULA 1999 13220
14: 100524568 AL EUFAULA 2000 13220
15: 100524568 AL EUFAULA 2001 13908
16: 100524568 AL EUFAULA 2002 13908
17: 100524568 AL EUFAULA 2003 13908
18: 100524568 AL EUFAULA 2004 13908
19: 100524568 AL EUFAULA 2005 13908
20: 100524568 AL EUFAULA 2006 13463
我要按组滞后数据。
经过shift
的操纵之后,结果变得很奇怪。
library(data.table)
DT[, POPULATION_TM1 := shift(POPULATION, -1), by = .(ID, STATE, NAME)]
ID STATE NAME YEAR POPULATION POPULATION_TM1
1: 100325240 AL FAIRHOPE 2007 16985 9218868437227407266
2: 100325240 AL FAIRHOPE 2008 17134 16985
3: 100325240 AL FAIRHOPE 2009 16185 17134
4: 100325240 AL FAIRHOPE 2010 16409 16185
5: 100325240 AL FAIRHOPE 2011 16588 16409
6: 100325240 AL FAIRHOPE 2012 14184 16588
7: 100325240 AL FAIRHOPE 2013 16385 14184
8: 100325240 AL FAIRHOPE 2014 16794 16385
9: 100325240 AL FAIRHOPE 2015 18089 16794
10: 100524568 AL EUFAULA 1996 13220 9218868437227407266
11: 100524568 AL EUFAULA 1997 13220 13220
12: 100524568 AL EUFAULA 1998 13220 13220
13: 100524568 AL EUFAULA 1999 13220 13220
14: 100524568 AL EUFAULA 2000 13220 13220
15: 100524568 AL EUFAULA 2001 13908 13220
16: 100524568 AL EUFAULA 2002 13908 13908
17: 100524568 AL EUFAULA 2003 13908 13908
18: 100524568 AL EUFAULA 2004 13908 13908
19: 100524568 AL EUFAULA 2005 13908 13908
20: 100524568 AL EUFAULA 2006 13463 13908
我不知道为什么是9218868437227407266
而不是NA
。我什至在数据中找不到9218868437227407266
。
我该如何解决?
DT
中有36000多个观测值。
> sessioninfo::session_info()
- Session info ------------------------------------------------------------- ------------------------------------------
setting value
version R version 3.5.1 (2018-07-02)
os Windows >= 8 x64
system x86_64, mingw32
ui RStudio
language (EN)
collate English_United States.1252
ctype English_United States.1252
tz America/New_York
date 2019-02-10
- Packages -----------------------------------------------------------------------------------------------------------
package * version date lib source
data.table * 1.11.8 2018-09-30 [1] CRAN (R 3.5.1)
> dput(head(DT, 20))
structure(list(ID = structure(c(4.95672544947781e-316, 4.95672544947781e-316,
4.95672544947781e-316, 4.95672544947781e-316, 4.95672544947781e-316,
4.95672544947781e-316, 4.95672544947781e-316, 4.95672544947781e-316,
4.95672544947781e-316, 4.96657356118323e-316, 4.96657356118323e-316,
4.96657356118323e-316, 4.96657356118323e-316, 4.96657356118323e-316,
4.96657356118323e-316, 4.96657356118323e-316, 4.96657356118323e-316,
4.96657356118323e-316, 4.96657356118323e-316, 4.96657356118323e-316
), class = "integer64"), STATE = c("AL", "AL", "AL", "AL", "AL",
"AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL",
"AL", "AL", "AL", "AL"), NAME = c("FAIRHOPE", "FAIRHOPE",
"FAIRHOPE", "FAIRHOPE", "FAIRHOPE", "FAIRHOPE", "FAIRHOPE", "FAIRHOPE",
"FAIRHOPE", "EUFAULA", "EUFAULA", "EUFAULA", "EUFAULA", "EUFAULA",
"EUFAULA", "EUFAULA", "EUFAULA", "EUFAULA", "EUFAULA", "EUFAULA"
), YEAR = c(2007L, 2008L, 2009L, 2010L, 2011L, 2012L, 2013L,
2014L, 2015L, 1996L, 1997L, 1998L, 1999L, 2000L, 2001L, 2002L,
2003L, 2004L, 2005L, 2006L), POPULATION = structure(c(8.39170499461357e-320,
8.46532077584392e-320, 7.99645247794058e-320, 8.10712318260901e-320,
8.1955609332146e-320, 7.00782712061224e-320, 8.09526560710882e-320,
8.29733845625789e-320, 8.93715346762231e-320, 6.53154783802128e-320,
6.53154783802128e-320, 6.53154783802128e-320, 6.53154783802128e-320,
6.53154783802128e-320, 6.87146500236006e-320, 6.87146500236006e-320,
6.87146500236006e-320, 6.87146500236006e-320, 6.87146500236006e-320,
6.6516057899607e-320), class = "integer64"), POPULATION_TM1 =
structure(c(NA,
8.39170499461357e-320, 8.46532077584392e-320, 7.99645247794058e-320,
8.10712318260901e-320, 8.1955609332146e-320, 7.00782712061224e-320,
8.09526560710882e-320, 8.29733845625789e-320, NA, 6.53154783802128e-320,
6.53154783802128e-320, 6.53154783802128e-320, 6.53154783802128e-320,
6.53154783802128e-320, 6.87146500236006e-320, 6.87146500236006e-320,
6.87146500236006e-320, 6.87146500236006e-320, 6.87146500236006e-320
), class = "integer64")), class = c("data.table", "data.frame"
), row.names = c(NA, -20L), .internal.selfref = <pointer:
0x0000000005e51ef0>)
更新R
和data.table
的最新版本
> sessioninfo::session_info()
- Session info -------------------------------------------------------------------------------------------------------
setting value
version R version 3.5.2 (2018-12-20)
os Windows >= 8 x64
system x86_64, mingw32
ui RStudio
language (EN)
collate English_United States.1252
ctype English_United States.1252
tz America/New_York
date 2019-02-10
- Packages -----------------------------------------------------------------------------------------------------------
package * version date lib source
assertthat 0.2.0 2017-04-11 [1] CRAN (R 3.5.1)
cli 1.0.0 2017-11-05 [1] CRAN (R 3.5.1)
crayon 1.3.4 2017-09-16 [1] CRAN (R 3.5.1)
data.table * 1.12.0 2019-01-13 [1] CRAN (R 3.5.2)
rstudioapi 0.8 2018-10-02 [1] CRAN (R 3.5.1)
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.5.2)
withr 2.1.2 2018-03-15 [1] CRAN (R 3.5.1)
yaml 2.2.0 2018-07-25 [1] CRAN (R 3.5.2)
我下载了R
和data.table
的最新版本后,导致 RSdutio R会话已停止工作。 A problem caused the program to stop working correctly. Please close the program.
如果我使用小的数据集,则可以使用。但是,有 36066 obs。 DT
中的231个变量中。我不确定它是否会影响。但是,我只用功能POPULATION_TM1
添加了一个新变量shift
,而没有操纵其他变量。
我认为问题不是可变的。问题是36066 obs。删除其他变量并运行后,它仍会终止我的Rstudio。