斯卡拉火花类型不匹配

时间:2019-02-12 20:42:40

标签: scala apache-spark rdd

我需要将rdd按两列进行分组并汇总计数。我有一个功能:

def constructDiagnosticFeatureTuple(diagnostic: RDD[Diagnostic])
: RDD[FeatureTuple] = {

    val grouped_patients = diagnostic
      .groupBy(x => (x.patientID, x.code))
      .map(_._2)
      .map{ events =>
        val p_id = events.map(_.patientID).take(1).mkString
        val f_code = events.map(_.code).take(1).mkString
        val count = events.size.toDouble
        ((p_id, f_code), count)
      }
    //should be in form:
    //diagnostic.sparkContext.parallelize(List((("patient", "diagnostics"), 1.0)))
}

在编译时,出现错误:

/FeatureConstruction.scala:38:3: type mismatch;
[error]  found   : Unit
[error]  required: org.apache.spark.rdd.RDD[edu.gatech.cse6250.features.FeatureConstruction.FeatureTuple]
[error]     (which expands to)  org.apache.spark.rdd.RDD[((String, String), Double)]
[error]   }
[error]   ^ 

我该如何解决? 我将此帖子改成红色:Scala Spark type missmatch found Unit, required rdd.RDD,但我不使用collect(),所以它对我没有帮助。

0 个答案:

没有答案