在循环中设置ffdf

时间:2014-07-03 06:39:36

标签: r ff ffbase

我尝试使用ffbase在循环中对非常大的ffdf对象进行子集化,但我收到错误消息:

Error in UseMethod("as.hi") : no applicable method for 'as.hi' applied to an object of
class "NULL"

我在具有大量可用内存的ssh上运行此代码。以下是我试图运行的代码:

# totalD is an ffdf with columns ID, TS, and TD, each with 288,133,589 rows. ID consists
# of integers. TS is a column of integer timestamps with second precision. TD is of type
# double. Uid3 is an integer vector consisting of the 1205 unique entries of totalD$ID.

# H_times creates a matrix of the sum of the entries in TD traveled in each hour
H_times <- function(totalD, Uid3) {

    # hours is the number of unique hours of the experiment
    hours <- length(unique(subset(totalD$TS, totalD$TS %% 3600 == 0)))-1

    # bH is used as a counter in a the following loops
    bH <- min(unique(subset(totalD$TS, totalD$TS %% 3600 == 0)))

    # sum_D_matrix is the output
    sum_D_matrix <- matrix(0, nrow = hours, ncol = length(Uid3))

    for(i in 1:length(Uid3)) {
        Bh <- bH
        for(j in 1:hours) {
            sum_D_matrix[j,i] <- sum(subset(totalD$TD, totalD$TS >= Bh & totalD$TS < (Bh + 3600) & totalD$ID == Uid3[i]))
            Bh <- Bh + 3600
        }
    }
    save(sum_D_matrix, file = "sum_D_matrix)
}

H_times(totalD, Uid3)

我试图实现jwijffels在this问题的评论中提出的修复,但无济于事。提前谢谢!

1 个答案:

答案 0 :(得分:0)

这是由以下行引起的:

sum_D_matrix[j,i] <- sum(subset(totalD$TD, 
    totalD$TS >= Bh & totalD$TS < (Bh + 3600) & totalD$ID == Uid3[i]))

选择可以为空。 ff的一个问题是它无法处理空向量。向量/ ffdf的大小应始终为> = 1.也许这应由subset.ff处理。但是,subset.ff应返回的内容尚不清楚。

您可以使用以下解决方法:

sel <- totalD$TS >= Bh & totalD$TS < (Bh + 3600) & totalD$ID == Uid3[i]
sel <- ffwhich(sel, sel)
if (is.null(sel)) {
  sum_D_matrix[j,i] <- 0
} else {
  sum_D_matrix[j,i] <- sum(totalD$TD[sel])
}
当结果向量为空时,

ffwhich返回NULL(正如我所提到的,它不能返回长度为0的向量)。

旁注

你使用子集的方式实际上有点奇怪。使用subset的原因之一是通过删除所有totalD$来简化表示法。使用它的更“通常”的方式是:

sum_D_matrix[j,i] <- sum(subset(totalD, TS >= Bh & TS < (Bh + 3600) & ID == Uid3[i], 
    TD, drop=TRUE))