马鞍框架:什么是计算NaN值最惯用的方法?

时间:2017-09-25 11:30:11

标签: scala saddle

我像这样建立一个Scala框架。

import org.saddle._
import scala.util.Random

val rowIx = Index(0 until 200)
val colIx = Index(0 until 100)

// create example having 15% of NaNs
val nanPerc = 0.15
val nanLength = math.round(nanPerc*rowIx.length*colIx.length).toInt
val nanInd = Random.shuffle(0 until rowIx.length*colIx.length).take(nanLength)
val rawMat = mat.rand(rowIx.length, colIx.length)
// contents gives a single array in row major
val rawMatContents = rawMat.contents
nanInd foreach { i => rawMatContents.update(i, Double.NaN) }

val df = Frame(rawMat, rowIx, colIx)

// now I'd like to test that the number of NaNs is correct but 
// most functions for this purpose in Frame e.g. countif exclude NaNs
df.???

最常用的(Scala,Saddle)计算NaN数量的方法是什么?

2 个答案:

答案 0 :(得分:1)

Frame.countif is implemented as

def countif(test: T => Boolean)(implicit ev: S2Stats): Series[CX, Int] = frame.reduce(_.countif(test))

Vec.countif is implemented as

def countif(test: Double => Boolean): Int = r.filterFoldLeft(t => sd.notMissing(t) && test(t))(0)((a,b) => a + 1)

我们可以使用相同但删除test并反转NaN检查:

vec.filterFoldLeft(x => x.isNaN)(0)((a, b) => a + 1)

要在Frame

上运行此功能
frame.reduce(_.filterFoldLeft(x => x.isNaN)(0)((a, b) => a + 1))

答案 1 :(得分:0)

我发现了一种非常简单直接的方式:

retDf.toMat.contents.filter(x => x.isNaN).length