Question

shift的{{1}}中的{p> R非常适合时间序列和时间窗口内容。但是列的列不会像其他元素的列那样滞后。在下面的代码中，data.table正确地领先/延迟gearLag，但gear并非落后gearsListLag，而gearsList正在shift内运作在同一行中滞后于元素。

gearsList

任何滞后的建议都与我滞后于其他元素的方式相同？

Answer 1

这是记录在案的行为。以下是?shift的示例的一部分：

# on lists
ll = list(1:3, letters[4:1], runif(2))
shift(ll, 1, type="lead")

# [[1]]
# [1]  2  3 NA
# 
# [[2]]
# [1] "c" "b" "a" NA 
# 
# [[3]]
# [1] 0.1190792        NA

要解决此问题，您可以为列表的每个值创建一个唯一ID：

dt[, carbList_id := match(carbList, unique(carbList))]

carbList_map = dt[, .(carbList = list(carbList[[1]])), by=carbList_id]

#    carbList_id carbList
# 1:           1        4
# 2:           2      1,2
# 3:           3        1
# 4:           4    2,4,3
# 5:           5        2
# 6:           6      4,8
# 7:           7        6

# or stick with long-form:
carbList_map = dt[, .(carb = carbList[[1]]), by=carbList_id]

#     carbList_id carb
#  1:           1    4
#  2:           2    1
#  3:           2    2
#  4:           3    1
#  5:           4    2
#  6:           4    4
#  7:           4    3
#  8:           5    2
#  9:           6    4
# 10:           6    8
# 11:           7    6

然后，只需shift或其他具有新ID列的内容。当您再次需要carbList的值时，您将必须与新表合并。

或者，如果您不需要使用这些值，只是为了浏览它们，请考虑将其改为字符串，例如carbList:=toString(sort(unique(carb)))或paste0。

旁注：在使用toString，paste0或list之前排序。

Answer 2

用户Frank注意到shift不支持列表。以下是使用for和set的解决方案，该解决方案使用data.table来计算组内滞后的正确索引，但其他所有索引都在for中工作。除了次要优化之外，这是data.table中我希望的最佳（干净+快速）吗？

dt <- data.table(mtcars)[,.(gear, carb, cyl)]
dt[,carbsList:=list(list(unique(carb))), by=.(cyl, gear)]
dt[,':='(rowLag=shift(.I), gearLag=shift(gear)), by=cyl]
dt[,':='(carbsListLag=list())]
cl_j <- which(names(dt) == "carbsListLag")
for (i in 1:nrow(dt)) {
   set(dt, i, cl_j, dt[dt[i,rowLag], list(carbsList)])
}
dt[,.(carb, gear, gearLag, carbsList, carbsListLag, .I, rowLag), by=cyl]
    cyl carb gear gearLag carbsList carbsListLag  I rowLag
 1:   6    4    4      NA         4         NULL  1     NA
 2:   6    4    4       4         4            4  2      1
 3:   6    1    3       4         1            4  4      2
 4:   6    1    3       3         1            1  6      4
...
13:   4    1    4       4       1,2          1,2 20     19
14:   4    1    3       4         1          1,2 21     20
15:   4    1    4       3       1,2            1 26     21
16:   4    2    5       4         2          1,2 27     26
17:   4    2    5       5         2            2 28     27
18:   4    2    4       5       1,2            2 32     28
19:   8    2    3      NA     2,4,3         NULL  5     NA
20:   8    4    3       3     2,4,3        2,4,3  7      5

data.table R中的滞后列表

2 个答案: