Question

我需要一些帮助来理解通过Scala类为RandomForestAlgorithm.scala（https://github.com/PredictionIO/PredictionIO/blob/develop/examples/scala-parallel-classification/custom-attributes/src/main/scala/RandomForestAlgorithm.scala）生成的错误。

我在PredictionIO中按原样构建项目（分类模板的自定义属性），并且出现了pio构建错误：

hduser@hduser-VirtualBox:~/PredictionIO/classTest$ pio build --verbose
[INFO] [Console$] Using existing engine manifest JSON at /home/hduser/PredictionIO/classTest/manifest.json
[INFO] [Console$] Using command '/home/hduser/PredictionIO/sbt/sbt' at the current working directory to build.
[INFO] [Console$] If the path above is incorrect, this process will fail.
[INFO] [Console$] Uber JAR disabled. Making sure lib/pio-assembly-0.9.5.jar is absent.
[INFO] [Console$] Going to run: /home/hduser/PredictionIO/sbt/sbt  package assemblyPackageDependency
[INFO] [Console$] [info] Loading project definition from /home/hduser/PredictionIO/classTest/project
[INFO] [Console$] [info] Set current project to template-scala-parallel-classification (in build file:/home/hduser/PredictionIO/classTest/)
[INFO] [Console$] [info] Compiling 1 Scala source to /home/hduser/PredictionIO/classTest/target/scala-2.10/classes...
[INFO] [Console$] [error] /home/hduser/PredictionIO/classTest/src/main/scala/RandomForestAlgorithm.scala:28: class RandomForestAlgorithm **needs to be abstract**, since method train in class P2LAlgorithm of type (sc: org.apache.spark.SparkContext, pd: com.test1.PreparedData)com.test1.**PIORandomForestModel is not defined**
[INFO] [Console$] [error]  class RandomForestAlgorithm(val ap: RandomForestAlgorithmParams) // CHANGED
[INFO] [Console$] [error]        ^
[INFO] [Console$] [error] one error found
[INFO] [Console$] [error] (compile:compile) Compilation failed
[INFO] [Console$] [error] Total time: 6 s, completed Jun 8, 2016 4:37:36 PM
[ERROR] [Console$] Return code of previous step is 1. Aborting.

所以当我解决导致错误的行并使其成为抽象对象时：

// extends P2LAlgorithm because the MLlib's RandomForestModel doesn't
// contain RDD.
 abstract class RandomForestAlgorithm(val ap: RandomForestAlgorithmParams) // CHANGED
  extends P2LAlgorithm[PreparedData, PIORandomForestModel, // CHANGED
  Query, PredictedResult] {

  def train(data: PreparedData): PIORandomForestModel = { // CHANGED
    // CHANGED
    // Empty categoricalFeaturesInfo indicates all features are continuous.
    val categoricalFeaturesInfo = Map[Int, Int]()
    val m = RandomForest.trainClassifier(
      data.labeledPoints,
      ap.numClasses,
      categoricalFeaturesInfo,
      ap.numTrees,
      ap.featureSubsetStrategy,
      ap.impurity,
      ap.maxDepth,
      ap.maxBins)
   new PIORandomForestModel(
    gendersMap = data.gendersMap,
    educationMap = data.educationMap,
    randomForestModel = m
   )
  }

pio构建成功但培训失败，因为它无法实例化模型的新分配：

[INFO] [Engine] Extracting datasource params...
[INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used.
[INFO] [Engine] Datasource params: (,DataSourceParams(6))
[INFO] [Engine] Extracting preparator params...
[INFO] [Engine] Preparator params: (,Empty)
[INFO] [Engine] Extracting serving params...
[INFO] [Engine] Serving params: (,Empty)
[WARN] [Utils] Your hostname, hduser-VirtualBox resolves to a loopback address: 127.0.1.1; using 10.0.2.15 instead (on interface eth0)
[WARN] [Utils] Set SPARK_LOCAL_IP if you need to bind to another address
[INFO] [Remoting] Starting remoting
[INFO] [Remoting] Remoting started; listening on addresses :[akka.tcp://sparkDriver@10.0.2.15:59444]
[WARN] [MetricsSystem] Using default name DAGScheduler for source because spark.app.id is not set.
**Exception in thread "main" java.lang.InstantiationException**
    at sun.reflect.InstantiationExceptionConstructorAccessorImpl.newInstance(InstantiationExceptionConstructorAccessorImpl.java:48)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at io.prediction.core.Doer$.apply(AbstractDoer.scala:52)
    at io.prediction.controller.Engine$$anonfun$1.apply(Engine.scala:171)
    at io.prediction.controller.Engine$$anonfun$1.apply(Engine.scala:170)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
    at scala.collection.immutable.List.foreach(List.scala:318)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
    at scala.collection.AbstractTraversable.map(Traversable.scala:105)
    at io.prediction.controller.Engine.train(Engine.scala:170)
    at io.prediction.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:65)
    at io.prediction.workflow.CreateWorkflow$.main(CreateWorkflow.scala:247)
    at io.prediction.workflow.CreateWorkflow.main(CreateWorkflow.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

所以有两个问题： 1.为什么在构建期间未考虑定义以下模型：

class PIORandomForestModel(
  val gendersMap: Map[String, Double],
  val educationMap: Map[String, Double],
  val randomForestModel: RandomForestModel
) extends Serializable

如何以不抛出pio构建错误并让训练为对象重新分配属性的方式定义PIORandomForestModel？

我已在PredictionIO Google小组中发布此问题，但尚未收到回复。在此先感谢您的帮助。

Scala对象抛出构建/训练错误

0 个答案: