rowMeans()和apply(..,mean)之间的区别在data.table上

时间:2014-08-19 15:23:02

标签: r data.table apply mean

给出data.table如下

DT <- as.data.table(
  cbind(PREC_01N=c(0.0,0.25,2.29,9.77,26.00,0.93,0.00,5.54,9.91,0.00,0.01,0.0), 
        PREC_01P=c(1.73,0.00,0.01,7.55,0.00,0.11,65.09,13.60,7.09,13.87,5.15,0.87),
        PREC_02N=c(0.0,0.26,0.00,9.58,1.50,2.46,0.03,4.94,0.00,1.53,6.11,0.02),
        PREC_02P=c(0.33,57.20,10.95,2.89,0.81,2.59,0.00,4.63,11.05,1.53,10.43,1.98),
        PREC_03N=c(1.26,0.04,0.00,27.25,0.00,3.87,0.01,0.48,17.73,0.05,12.14,0.02),
        PREC_03P=c(0.21,5.74,0.00,1.59,23.35,1.36,0.00,3.75,6.14,0.37,0.00,0.00),
        PREC_04N=c(0.00,0.34,1.52,15.20,0.00,3.43,0.07,0.00,0.01,15.12,25.55,0.04),
        PREC_04P=c(5.42,9.13,20.64,12.68,35.68,27.05,0.00,0.02,0.00,1.60,0.00,0.67),
        PREC_05N=c(0.03,3.56,0.08,9.98,0.01,3.94,0.32,0.00,15.58,0.01,0.00,0.00),
        PREC_05P=c(0.21,0.02,57.97,0.01,0.00,4.31,0.00,1.55,13.03,0.07,54.75,0.78),
        PREC_06N=c(0.19,4.08,0.10,12.22,0.00,0.72,0.03,0.09,15.19,0.01,9.29,0.18),
        PREC_06P=c(0.05,0.59,0.29,6.65,35.56,14.02,0.02,0.38,13.46,0.00,1.07,0.00),
        PREC_07N=c(0.42,4.50,11.36,3.34,4.04,0.02,0.03,0.00,1.66,0.00,9.44,0.00),
        PREC_07P=c(0.35,10.37,13.12,13.24,8.29,30.73,0.72,0.01,9.74,0.75,5.77,0.00),
        PREC_AVN=c(1.26,0.00,16.92,13.09,1.43,6.13,0.00,12.10,8.23,1.00,7.99,0.00)
  ))

为了进行测试,我使用2种不同的方法创建2列,即15列的平均值:

DT[,PREC_MEAN:=rowMeans(DT[,1:15,with=F])]         # Create column PREC_MEAN - FASTER
DT[,PREC_MEAN2:=apply(DT[,1:15,with=F], 1, mean)]  # Create column PREC_MEAN2 - SLOWER

令我惊讶的是,它们在某些方面有所不同:

identical(DT$PREC_MEAN, DT$PREC_MEAN2)             # FALSE ?????
DTbad <- DT$PREC_MEAN != DT$PREC_MEAN2             # Logical vector 
sum(DTbad)                                         # 10 inequalities????
DT <- cbind(ROWID=1:nrow(DT),DT)                   # Adding a ROWID col to create the IDENTICAL column
DT[,IDENTICAL:=identical(PREC_MEAN, PREC_MEAN2), by=ROWID]  # By the way, is there another easier way?
12条线中的10条显示它们的MEAN值不同!

DT[, list(PREC_MEAN, PREC_MEAN2, IDENTICAL)]          # What is different?
DT[, list(format(PREC_MEAN, scientific = T),format(PREC_MEAN2, scientific = T), IDENTICAL)]  # Trying via scientific notation

DT是572.400 x 66 data.table的子集,上面相同的过程显示了我在这里再现的10个差异,并添加了2个更好的案例,第1个和最后一个。

有谁知道发生了什么?为什么会出现这种差异?

事先提前。

0 个答案:

没有答案