我尝试使用ffbase在循环中对非常大的ffdf对象进行子集化,但我收到错误消息:
Error in UseMethod("as.hi") : no applicable method for 'as.hi' applied to an object of
class "NULL"
我在具有大量可用内存的ssh上运行此代码。以下是我试图运行的代码:
# totalD is an ffdf with columns ID, TS, and TD, each with 288,133,589 rows. ID consists
# of integers. TS is a column of integer timestamps with second precision. TD is of type
# double. Uid3 is an integer vector consisting of the 1205 unique entries of totalD$ID.
# H_times creates a matrix of the sum of the entries in TD traveled in each hour
H_times <- function(totalD, Uid3) {
# hours is the number of unique hours of the experiment
hours <- length(unique(subset(totalD$TS, totalD$TS %% 3600 == 0)))-1
# bH is used as a counter in a the following loops
bH <- min(unique(subset(totalD$TS, totalD$TS %% 3600 == 0)))
# sum_D_matrix is the output
sum_D_matrix <- matrix(0, nrow = hours, ncol = length(Uid3))
for(i in 1:length(Uid3)) {
Bh <- bH
for(j in 1:hours) {
sum_D_matrix[j,i] <- sum(subset(totalD$TD, totalD$TS >= Bh & totalD$TS < (Bh + 3600) & totalD$ID == Uid3[i]))
Bh <- Bh + 3600
}
}
save(sum_D_matrix, file = "sum_D_matrix)
}
H_times(totalD, Uid3)
我试图实现jwijffels在this问题的评论中提出的修复,但无济于事。提前谢谢!
答案 0 :(得分:0)
这是由以下行引起的:
sum_D_matrix[j,i] <- sum(subset(totalD$TD,
totalD$TS >= Bh & totalD$TS < (Bh + 3600) & totalD$ID == Uid3[i]))
选择可以为空。 ff
的一个问题是它无法处理空向量。向量/ ffdf
的大小应始终为> = 1.也许这应由subset.ff
处理。但是,subset.ff
应返回的内容尚不清楚。
您可以使用以下解决方法:
sel <- totalD$TS >= Bh & totalD$TS < (Bh + 3600) & totalD$ID == Uid3[i]
sel <- ffwhich(sel, sel)
if (is.null(sel)) {
sum_D_matrix[j,i] <- 0
} else {
sum_D_matrix[j,i] <- sum(totalD$TD[sel])
}
当结果向量为空时, ffwhich
返回NULL
(正如我所提到的,它不能返回长度为0的向量)。
旁注
你使用子集的方式实际上有点奇怪。使用subset
的原因之一是通过删除所有totalD$
来简化表示法。使用它的更“通常”的方式是:
sum_D_matrix[j,i] <- sum(subset(totalD, TS >= Bh & TS < (Bh + 3600) & ID == Uid3[i],
TD, drop=TRUE))