Question

这就是我尝试过的。取决于用户放入函数我想要将String或Double添加到新Chunk。

package org.apache.spark.h2o.utils

import water.fvec.{NewChunk, Frame, Chunk}
import water._
import water.parser.ValueString

class ReplaceNa[T >: Any](a: T) extends MRTask{
  override def map(c: Chunk, nc: NewChunk): Unit = {
    for (row <- 0 until c.len()) {

        a match{
             case s: ValueString if(c.isNA(row)) => nc.addStr(s)           
             case d: Double      if(c.isNA(row)) => nc.addNum(d)

      }
    }
  }
}

但我收到了错误

 error: value outputFrame is not a member of Nothing
          pred.add(new ReplaceNa(3).doAll(1, pred.vec(4)).outputFrame(Array("s"), null))

感谢您的帮助！

Answer 1

I have several comments:

check for NA outside the switch branch
you are missing non-NA case hence you are generating vector which is shorter than input vector (i expect you would like to generate the same length vector)

Regarding generics, you need to provide type specialization. For example, something like the following snippet:

class ReplaceNA[T](val value: T)(implicit add: TAdd[T]) extends MRTask[ReplaceNA[T]] {
  override def map(c: Chunk, nc: NewChunk): Unit = {
    for (row <- 0 until c.len()) {
      // Replace NAs by given value
      if (c.isNA(row)) {
        add.addValue(nc, value)
      } else {
        // Do something with default value
        nc.addNA()
      }
    }
  }

}

trait TAdd[T] extends Serializable {
  def addValue(nc: NewChunk, value: T)
}

object TAdd extends Serializable {
  implicit val addDouble = new TAdd[Double] { def addValue(nc: NewChunk, value: Double) = nc.addNum(value) }
  implicit val addFloat = new TAdd[Float] { def addValue(nc: NewChunk, value: Float) = nc.addNum(value) }
  implicit val addValueString = new TAdd[ValueString] { def addValue(nc: NewChunk, value: ValueString) = nc.addStr(value) }
}

如何参数化类和实现方法取决于Scala中的类型

1 个答案: