Question

我发现在我的原始数据中，我有一些字段正在不断跟踪颗粒的消耗。但是，当数字以0结尾时，省略了零，因此我想返回并修复此问题。这就是我的数据的样子

accumulatoin_calc <- as.data.table(read.table('pellet_data.csv', header = FALSE, sep = ";"))     
head(accumulatoin_calc[,.(Date_Hour_PC_EEST, Pellets_used_kilograms)])
         Date_Hour_PC_EEST Pellets_used_kilograms
    1: 2016-01-03 09:37:16                      19348
    2: 2016-01-03 09:37:21                      19349
    3: 2016-01-03 09:37:26                      1934
    4: 2016-01-03 09:37:31                      19341
    5: 2016-01-03 09:37:36                      19342
    6: 2016-01-03 09:37:41                      19343

我写了一些脚本，该脚本会遍历我的数据，并检查19349/1934是否大于1且小于0，这是否有问题。差中的位数实际上会告诉我们缺少多少个零。例如。在这种情况下，19349/1935 = 9.999483，这意味着仅缺少一个零。如果数字为99.0082，则表示我们省略了2个零。

for(j in 2:nrow(accumulatoin_calc)) {
        if( accumulatoin_calc$Pellets_used_kilograms[j-1] != 0 & (accumulatoin_calc$Pellets_used_kilograms[j]/accumulatoin_calc$Pellets_used_kilograms[j-1]) > 1 ){
                digits = nchar(trunc(floor(accumulatoin_calc$Pellets_used_kilograms[j]/accumulatoin_calc$Pellets_used_kilograms[j-1])))
                for(i in 1:nchar(trunc(digits))) {
                        accumulatoin_calc$Pellets_used_kilograms[j] <- as.numeric(paste0(accumulatoin_calc$Pellets_used_kilograms[j], 0))
                        cat("\nTransforming row ", j, " out of ", nrow(accumulatoin_calc), " rows where 0s are omitted in the original file.")
                }
        }
}

不幸的是，此脚本似乎相当慢，并且可能不那么有效，因为我的数据有大约1000万个数据点。您能提出更有效，更快捷的方法吗？

将0添加到data.table中，其中省略了结尾0

0 个答案: