na.locf-不允许负长度向量

时间:2019-05-28 14:32:29

标签: r

我正在尝试使用以下代码结转非na值,直到到达下一个非na:

test3 <- data.table(final_data)

test3 <- test3[, na.locf(test3, na.rm = F, fromLast = F, maxgap = Inf), by = "gvkey"]

当我不包括... by = gvkey]部分时,它可以工作。 但是,我需要代码在到达新的gvkey时停止,否则,BC会转发错误的公司数据。我以长格式获取(下面的示例)。如您所见,如果我不使用... by = gvkey],会发生什么情况,它将gvkey1的值传递给gvkey2,我想避免这种情况。但是这样做时,我收到以下错误消息:

Error in `[.data.table`(test3, , na.locf(test3, na.rm = F, fromLast = F,  : 
  negative length vectors are not allowed

<table><tbody><tr><th>date</th><th>gvkey</th><th>dlcq</th><th>dlttq</th></tr><tr><td>date1</td><td>gvkey1</td><td>10</td><td>20</td></tr><tr><td>date2</td><td>gvkey1</td><td>NA</td><td>NA</td></tr><tr><td>date3</td><td>gvkey1</td><td>NA</td><td>10</td></tr><tr><td>.</td><td> </td><td> </td><td> </td></tr><tr><td>.</td><td> </td><td> </td><td> </td></tr><tr><td>.</td><td> </td><td> </td><td> </td></tr><tr><td>date10</td><td>gvkey2</td><td>NA</td><td>NA</td></tr><tr><td>date11</td><td>gvkey2</td><td>10</td><td>NA</td></tr><tr><td>date12</td><td>gvkey2</td><td>NA</td><td>NA</td></tr></tbody></table>

任何建议/解决方案都非常欢迎!

2 个答案:

答案 0 :(得分:1)

使用最新开发版本(v1.12.3)中的data.table::nafill()表data.table

DT <- fread("date | gvkey | dlcq | dlttq
            date1 | gvkey1 | 10 | 20
            date2 | gvkey1 | NA | NA 
            date3 | gvkey1 | NA | 10 
            date10 | gvkey2 | NA | NA 
            date11 | gvkey2 | 10 | NA 
            date12 | gvkey2 | NA | NA")

cols = c("dlcq", "dlttq")
DT[, (cols) := lapply( .SD, nafill, type = "locf" ), by = gvkey, .SDcols = cols][]

#      date  gvkey dlcq dlttq
# 1:  date1 gvkey1   10    20
# 2:  date2 gvkey1   10    20
# 3:  date3 gvkey1   10    10
# 4: date10 gvkey2   NA    NA
# 5: date11 gvkey2   10    NA
# 6: date12 gvkey2   10    NA

请参阅:https://github.com/Rdatatable/data.table/wiki/Installation,以获取有关加载dev版本的说明。

答案 1 :(得分:0)

使用末尾注释中可重复显示的数据,我不会收到任何错误消息:

library(data.table)
library(zoo)

test3[, na.locf(test3, na.rm = FALSE, fromLast = FALSE, maxgap = Inf), by = "gvkey"]

给出这个没有错误的答案,尽管这不是期望的答案。

     gvkey   date  gvkey dlcq dlttq
 1: gvkey1  date1 gvkey1   10    20
 2: gvkey1  date2 gvkey1   10    20
 3: gvkey1  date3 gvkey1   10    10
 4: gvkey1 date10 gvkey2   10    10
 5: gvkey1 date11 gvkey2   10    10
 6: gvkey1 date12 gvkey2   10    10
 7: gvkey2  date1 gvkey1   10    20
 8: gvkey2  date2 gvkey1   10    20
 9: gvkey2  date3 gvkey1   10    10
10: gvkey2 date10 gvkey2   10    10
11: gvkey2 date11 gvkey2   10    10
12: gvkey2 date12 gvkey2   10    10

问题在于,在test3中引用test3的正确方法是像这样使用.SD

test3[, na.locf(.SD, na.rm = FALSE, fromLast = FALSE, maxgap = Inf), by = "gvkey"]

给予:

    gvkey   date dlcq dlttq
1: gvkey1  date1   10    20
2: gvkey1  date2   10    20
3: gvkey1  date3   10    10
4: gvkey2 date10   NA    NA
5: gvkey2 date11   10    NA
6: gvkey2 date12   10    NA

注意

Lines <- "
date | gvkey | dlcq | dlttq
date1 | gvkey1 | 10 | 20
date2 | gvkey1 | NA | NA 
date3 | gvkey1 | NA | 10 
date10 | gvkey2 | NA | NA 
date11 | gvkey2 | 10 | NA 
date12 | gvkey2 | NA | NA"


library(data.table)
test3 <- fread(Lines)