我试图为每个键获得25行,如下所示:
import spark.implicits._
val record = file.map(rec=>{
var row = rec.date + "," + rec.registrar + "," + rec.agency + "," + rec.state + "," +
rec.district + "," + rec.subDistrict + "," + rec.pinCode + "," + rec.gender + "," +
rec.age + "," + rec.aadharGenerated + "," + rec.rejected + "," + rec.mobileNo + "," +
rec.email
(rec.state,row)
}).groupByKey(_._2).mapGroups((a,b)=>(a,b.toSet.take(25))).collect()
record.foreach(println)
我尝试了其他解决方案,但这些解决方案无效。
错误堆栈跟踪:
Exception in thread "main" java.lang.UnsupportedOperationException: No Encoder found for scala.collection.immutable.Set[(String, String)]
- field (class: "scala.collection.immutable.Set", name: "_2")
- root class: "scala.Tuple2"
at org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:598)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$9.apply(ScalaReflection.scala:592)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$9.apply(ScalaReflection.scala:583)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:355)
at org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:583)
at org.apache.spark.sql.catalyst.ScalaReflection$.serializerFor(ScalaReflection.scala:425)
at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:61)
at org.apache.spark.sql.Encoders$.product(Encoders.scala:274)
at org.apache.spark.sql.SQLImplicits.newProductEncoder(SQLImplicits.scala:47)
at KPI1.Top25$.main(Top25.scala:20)
at KPI1.Top25.main(Top25.scala)