我处于Excetion之下,在打印从mapPartitions返回的Dataframe数据时,
Caused by: java.io.NotSerializableException: java.lang.Object
Serialization stack:
- object not serializable (class: java.lang.Object, value: java.lang.Object@382eb59a)
- field (class: sql.MapPartitions$$anonfun$1, name: nonLocalReturnKey1$1, type: class java.lang.Object)
- object (class sql.MapPartitions$$anonfun$1, <function1>)
- field (class: org.apache.spark.sql.execution.MapPartitionsExec, name: func, type: interface scala.Function1)
- object (class org.apache.spark.sql.execution.MapPartitionsExec, MapPartitions <function1>, obj#22: org.apache.spark.sql.Row
+- DeserializeToObject createexternalrow(_c0#10.toString, _c1#11.toString, _c2#12.toString, _c3#13.toString, StructField(_c0,StringType,true), StructField(_c1,StringType,true), StructField(_c2,StringType,true), StructField(_c3,StringType,true)), obj#21: org.apache.spark.sql.Row
+- *(1) FileScan csv [_c0#10,_c1#11,_c2#12,_c3#13] Batched: false, Format: CSV, Location: InMemoryFileIndex[file:/C:/Users/Vikas Singh/Documents/Vikas/Study/Spark Material/spark/sample_fi..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<_c0:string,_c1:string,_c2:string,_c3:string>
)
以下是代码段:
val dataDF = spark.read.format("csv")
.option("header", "false")
.load("dataFile.csv")
val schema = StructType(Seq(
StructField("year", StringType),
StructField("make", StringType),
StructField("model", StringType)
))
val encoder = RowEncoder(schema)
val transformed = lines.mapPartitions(partiotion => {
val recMapPartitions = partiotion.map(rec => {
rec.getAs(1)
})
return recMapPartitions
})(encoder)
transformed.show()
可以请你帮忙!