编写ML管道时出现错误:无法编写具有空组的架构:message spark_schema

时间:2018-09-05 18:53:50

标签: scala apache-spark apache-spark-ml

保存Spark ML Pipeline时遇到一些问题。我正在使用Scala 2.11和Spark 2.3.1。

val pipeline = new Pipeline().setStages(stages)

// It's working
pipeline.save("/tmp/example")

// It's not working
val modelPipeline = pipeline.fit(ds)
modelPipeline.save("/tmp/example")

我有此错误。我一直在寻找Internet上的解决方案。我发现了有关此相同错误的一些信息,但将数据写入了镶木地板。我不知道为什么在编写ML Pipeline时会出现此错误。

Caused by: org.apache.parquet.schema.InvalidSchemaException: Cannot write a schema with an empty group: message spark_schema {
}

    at org.apache.parquet.schema.TypeUtil$1.visit(TypeUtil.java:27)
    at org.apache.parquet.schema.TypeUtil$1.visit(TypeUtil.java:37)
    at org.apache.parquet.schema.MessageType.accept(MessageType.java:58)
    at org.apache.parquet.schema.TypeUtil.checkValidWriteSchema(TypeUtil.java:23)
    at org.apache.parquet.hadoop.ParquetFileWriter.<init>(ParquetFileWriter.java:225)
    at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:342)
    at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:302)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.<init>(ParquetOutputWriter.scala:37)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:151)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.newOutputWriter(FileFormatWriter.scala:367)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:378)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:269)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:267)
    at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1411)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:272)
    ... 8 more

0 个答案:

没有答案