Spark SQL抛出错误“java.lang.UnsupportedOperationException:Unknown field type:void”

时间:2017-12-27 06:38:19

标签: hadoop apache-spark hive apache-spark-sql

我在创建一个列值默认为NULL的表时,在Spark(1.6)SQL中遇到错误。例如:创建表测试为select column_a,NULL为test_temp中的column_b;

同样的事情在Hive中起作用并创建数据类型为“void”的列。

我使用空字符串而不是NULL来避免异常和新列获取字符串数据类型。

有没有更好的方法使用spark sql在hive表中插入空值?

2017-12-26 07:27:59 ERROR StandardImsLogger$:177 - org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.UnsupportedOperationException: Unknown field type: void
    at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:789)
    at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:746)
    at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$createTable$1.apply$mcV$sp(ClientWrapper.scala:428)
    at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$createTable$1.apply(ClientWrapper.scala:426)
    at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$createTable$1.apply(ClientWrapper.scala:426)
    at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:293)
    at org.apache.spark.sql.hive.client.ClientWrapper.liftedTree1$1(ClientWrapper.scala:239)
    at org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:238)
    at org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:281)
    at org.apache.spark.sql.hive.client.ClientWrapper.createTable(ClientWrapper.scala:426)
    at org.apache.spark.sql.hive.execution.CreateTableAsSelect.metastoreRelation$lzycompute$1(CreateTableAsSelect.scala:72)
    at org.apache.spark.sql.hive.execution.CreateTableAsSelect.metastoreRelation$1(CreateTableAsSelect.scala:47)
    at org.apache.spark.sql.hive.execution.CreateTableAsSelect.run(CreateTableAsSelect.scala:89)
    at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
    at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
    at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
    at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:56)
    at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:56)
    at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:153)
    at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145)
    at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
    at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
    at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:829)

1 个答案:

答案 0 :(得分:1)

我无法找到有关数据类型void的更多信息,但它看起来有点等同于我们在Scala中的Any数据类型。

at the end of this page解释了void可以转换为任何其他数据类型。

以下是一些与您面临的问题类似的JIRA问题

因此,正如评论中所提到的,您可以将其转换为任何隐式数据类型而不是NULL

select cast(NULL as string) as column_b