Apache Spark Parquet:无法构建一个空组

时间:2017-05-03 18:18:40

标签: apache-spark parquet

我使用Apache Spark 2.1.1(使用2.1.0,它是相同的,今天切换)。 我有一个数据集:

root
|-- muons: array (nullable = true)
|    |-- element: struct (containsNull = true)
|    |    |-- reco::Candidate: struct (nullable = true)
|    |    |-- qx3_: integer (nullable = true)
|    |    |-- pt_: float (nullable = true)
|    |    |-- eta_: float (nullable = true)
|    |    |-- phi_: float (nullable = true)
|    |    |-- mass_: float (nullable = true)
|    |    |-- vertex_: struct (nullable = true)
|    |    |    |-- fCoordinates: struct (nullable = true)
|    |    |    |    |-- fX: float (nullable = true)
|    |    |    |    |-- fY: float (nullable = true)
|    |    |    |    |-- fZ: float (nullable = true)
|    |    |-- pdgId_: integer (nullable = true)
|    |    |-- status_: integer (nullable = true)
|    |    |-- cachePolarFixed_: struct (nullable = true)
|    |    |-- cacheCartesianFixed_: struct (nullable = true)

如您所见,此架构中有3个空结构。我100%知道我可以阅读/操纵/做任何事情。但是,当我尝试在镶木地板中写入磁盘时,我得到以下异常:

dsReduced.write.format("parquet").save(outputPathName):

java.lang.IllegalStateException: Cannot build an empty group
at org.apache.parquet.Preconditions.checkState(Preconditions.java:91)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:622)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:497)
at org.apache.parquet.schema.Types$Builder.named(Types.java:286)
at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:535)
at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321)
at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:534)
at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:533)

所以,基本上我想了解它是否是一个错误或预期的行为?我还假设它与空结构有关。任何帮助都会非常感激!

更新:我已经快速创建了剥离版本,并且该版本没有任何问题!任何见解都会非常有用!

VK

1 个答案:

答案 0 :(得分:6)

Parquet不会写空结构

了解更多信息 - 请参阅此处https://issues.apache.org/jira/browse/SPARK-20593

VK