在Spark 2中获取CastClassException:java.lang.ClassCastException:java.util.ArrayList无法强制转换为org.apache.hadoop.io.Text

时间:2017-12-15 12:45:25

标签: scala hadoop apache-spark struct hive

在处理具有复杂数据类型列(如Array和Array)的表时,在Spark 2中获取CastClassException

我尝试的动作很简单:计数

df=spark.sql("select * from <tablename>")
df.count    

但在运行spark应用程序时低于Error

Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 5, sandbox.hortonworks.com, executor 1): java.lang.ClassCastException: java.util.ArrayList cannot be cast to org.apache.hadoop.io.Text
at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveWritableObject(WritableStringObjectInspector.java:41)
at org.apache.spark.sql.hive.HiveInspectors$$anonfun$unwrapperFor$23.apply(HiveInspectors.scala:529)
at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$15.apply(TableReader.scala:419)
at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$15.apply(TableReader.scala:419)
at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:435)
at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:426)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)

奇怪的是,spark-shell中数据框的相同操作正常工作

表格下面是复杂的列:

|-- sku_product: array (nullable = true)
|    |-- element: struct (containsNull = true)
|    |    |-- sku_id: string (nullable = true)
|    |    |-- qty: string (nullable = true)
|    |    |-- price: string (nullable = true)
|    |    |-- display_name: string (nullable = true)
|    |    |-- sku_displ_clr_desc: string (nullable = true)
|    |    |-- sku_sz_desc: string (nullable = true)
|    |    |-- parent_product_id: string (nullable = true)
|    |    |-- delivery_mthd: string (nullable = true)
|    |    |-- pick_up_store_id: string (nullable = true)
|    |    |-- delivery: string (nullable = true)
|-- hitid_low: string (nullable = true)
|-- evar7: array (nullable = true)
|    |-- element: string (containsNull = true)
|-- hitid_high: string (nullable = true)
|-- evar60: array (nullable = true)
|    |-- element: string (containsNull = true)

如果需要任何进一步的信息,请告诉我。

1 个答案:

答案 0 :(得分:0)

我有类似的问题。我正在使用带有镶木地板文件的spark 2.1。 我发现其中一个镶木地板文件的架构与其他文件不同。因此,当我试图阅读所有内容时,我遇到了投射错误。 为了解决它,我只是逐个文件检查。