获取spark数据集的java.lang.UnsupportedOperationException

时间:2018-04-11 14:43:55

标签: scala apache-spark dataset

我试图为每个键获得25行,如下所示:

import spark.implicits._
     val record = file.map(rec=>{
          var row = rec.date + "," +  rec.registrar + "," + rec.agency + "," +  rec.state + "," + 
          rec.district + "," +  rec.subDistrict + "," +  rec.pinCode + "," +  rec.gender + "," + 
          rec.age + "," + rec.aadharGenerated + "," +  rec.rejected + "," +  rec.mobileNo + "," + 
          rec.email
          (rec.state,row)
        }).groupByKey(_._2).mapGroups((a,b)=>(a,b.toSet.take(25))).collect()

       record.foreach(println)

我尝试了其他解决方案,但这些解决方案无效。

错误堆栈跟踪:

Exception in thread "main" java.lang.UnsupportedOperationException: No Encoder found for scala.collection.immutable.Set[(String, String)]
- field (class: "scala.collection.immutable.Set", name: "_2")
- root class: "scala.Tuple2"
    at org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:598)
    at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$9.apply(ScalaReflection.scala:592)
    at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$9.apply(ScalaReflection.scala:583)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.immutable.List.foreach(List.scala:392)
    at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
    at scala.collection.immutable.List.flatMap(List.scala:355)
    at org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:583)
    at org.apache.spark.sql.catalyst.ScalaReflection$.serializerFor(ScalaReflection.scala:425)
    at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:61)
    at org.apache.spark.sql.Encoders$.product(Encoders.scala:274)
    at org.apache.spark.sql.SQLImplicits.newProductEncoder(SQLImplicits.scala:47)
    at KPI1.Top25$.main(Top25.scala:20)
    at KPI1.Top25.main(Top25.scala)

0 个答案:

没有答案