我正在尝试使用以下代码结转非na值,直到到达下一个非na:
test3 <- data.table(final_data)
test3 <- test3[, na.locf(test3, na.rm = F, fromLast = F, maxgap = Inf), by = "gvkey"]
当我不包括... by = gvkey]部分时,它可以工作。 但是,我需要代码在到达新的gvkey时停止,否则,BC会转发错误的公司数据。我以长格式获取(下面的示例)。如您所见,如果我不使用... by = gvkey],会发生什么情况,它将gvkey1的值传递给gvkey2,我想避免这种情况。但是这样做时,我收到以下错误消息:
Error in `[.data.table`(test3, , na.locf(test3, na.rm = F, fromLast = F, : negative length vectors are not allowed
<table><tbody><tr><th>date</th><th>gvkey</th><th>dlcq</th><th>dlttq</th></tr><tr><td>date1</td><td>gvkey1</td><td>10</td><td>20</td></tr><tr><td>date2</td><td>gvkey1</td><td>NA</td><td>NA</td></tr><tr><td>date3</td><td>gvkey1</td><td>NA</td><td>10</td></tr><tr><td>.</td><td> </td><td> </td><td> </td></tr><tr><td>.</td><td> </td><td> </td><td> </td></tr><tr><td>.</td><td> </td><td> </td><td> </td></tr><tr><td>date10</td><td>gvkey2</td><td>NA</td><td>NA</td></tr><tr><td>date11</td><td>gvkey2</td><td>10</td><td>NA</td></tr><tr><td>date12</td><td>gvkey2</td><td>NA</td><td>NA</td></tr></tbody></table>
任何建议/解决方案都非常欢迎!
答案 0 :(得分:1)
使用最新开发版本(v1.12.3)中的data.table::nafill()
表data.table
DT <- fread("date | gvkey | dlcq | dlttq
date1 | gvkey1 | 10 | 20
date2 | gvkey1 | NA | NA
date3 | gvkey1 | NA | 10
date10 | gvkey2 | NA | NA
date11 | gvkey2 | 10 | NA
date12 | gvkey2 | NA | NA")
cols = c("dlcq", "dlttq")
DT[, (cols) := lapply( .SD, nafill, type = "locf" ), by = gvkey, .SDcols = cols][]
# date gvkey dlcq dlttq
# 1: date1 gvkey1 10 20
# 2: date2 gvkey1 10 20
# 3: date3 gvkey1 10 10
# 4: date10 gvkey2 NA NA
# 5: date11 gvkey2 10 NA
# 6: date12 gvkey2 10 NA
请参阅:https://github.com/Rdatatable/data.table/wiki/Installation,以获取有关加载dev版本的说明。
答案 1 :(得分:0)
使用末尾注释中可重复显示的数据,我不会收到任何错误消息:
library(data.table)
library(zoo)
test3[, na.locf(test3, na.rm = FALSE, fromLast = FALSE, maxgap = Inf), by = "gvkey"]
给出这个没有错误的答案,尽管这不是期望的答案。
gvkey date gvkey dlcq dlttq
1: gvkey1 date1 gvkey1 10 20
2: gvkey1 date2 gvkey1 10 20
3: gvkey1 date3 gvkey1 10 10
4: gvkey1 date10 gvkey2 10 10
5: gvkey1 date11 gvkey2 10 10
6: gvkey1 date12 gvkey2 10 10
7: gvkey2 date1 gvkey1 10 20
8: gvkey2 date2 gvkey1 10 20
9: gvkey2 date3 gvkey1 10 10
10: gvkey2 date10 gvkey2 10 10
11: gvkey2 date11 gvkey2 10 10
12: gvkey2 date12 gvkey2 10 10
问题在于,在test3
中引用test3
的正确方法是像这样使用.SD
:
test3[, na.locf(.SD, na.rm = FALSE, fromLast = FALSE, maxgap = Inf), by = "gvkey"]
给予:
gvkey date dlcq dlttq
1: gvkey1 date1 10 20
2: gvkey1 date2 10 20
3: gvkey1 date3 10 10
4: gvkey2 date10 NA NA
5: gvkey2 date11 10 NA
6: gvkey2 date12 10 NA
Lines <- "
date | gvkey | dlcq | dlttq
date1 | gvkey1 | 10 | 20
date2 | gvkey1 | NA | NA
date3 | gvkey1 | NA | 10
date10 | gvkey2 | NA | NA
date11 | gvkey2 | 10 | NA
date12 | gvkey2 | NA | NA"
library(data.table)
test3 <- fread(Lines)