尽管提供了火花,但Spark无法找到编码器(案例类)

时间:2019-04-29 06:34:16

标签: scala apache-spark

试图弄清楚为什么在编码器上出现错误,任何见解都会有所帮助!

  

错误无法找到类型为SolrNewsDocument的编码器,需要隐式Encoder [SolrNewsDocument]来存储`

很明显,我已经导入了spark.implicits._。我还提供了一个编码器作为案例类。

def ingestDocsToSolr(newsItemDF: DataFrame) = {
  case class SolrNewsDocument(
                             title: String,
                             body: String,
                             publication: String,
                             date: String,
                             byline: String,
                             length: String
                           )
  import spark.implicits._
  val solrDocs = newsItemDF.as[SolrNewsDocument].map { doc =>
    val solrDoc = new SolrInputDocument
    solrDoc.setField("title", doc.title.toString)
    solrDoc.setField("body", doc.body)
    solrDoc.setField("publication", doc.publication)
    solrDoc.setField("date", doc.date)
    solrDoc.setField("byline", doc.byline)
    solrDoc.setField("length", doc.length)

    solrDoc
  }

  // can be used for stream SolrSupport.
  SolrSupport.indexDocs("localhost:2181", "collection", 10, solrDocs.rdd);
  val solrServer = SolrSupport.getCachedCloudClient("localhost:2181")
  solrServer.setDefaultCollection("collection")
  solrServer.commit(false, false)
}

2 个答案:

答案 0 :(得分:0)

//Check this one.-Move case class declaration before function declaration.
//Encoder is created once case class statement is executed by compiler. Then only compiler will be able to use encoder inside function deceleration.


import spark.implicits._

case class SolrNewsDocument(title: String,body: String,publication: String,date: String,byline: String,length: String)


def ingestDocsToSolr(newsItemDF:DataFrame) = {
val solrDocs = newsItemDF.as[SolrNewsDocument]}




答案 1 :(得分:0)

我在尝试遍历文本文件时遇到此错误,就我而言,从spark 2.4.x开始,问题是我必须先将其转换为RDD(以前是隐式的)

textFile
  .rdd
  .flatMap(line=>line.split(" "))

Migrating our Scala codebase to Spark 2