摘要:尝试将Spark DataFrame插入到hive表中会导致错误循环和数据库损坏。
详细说明:
错误循环:
df.show(5)
df.write.mode(SaveMode.Overwrite).saveAsTable("dbnamexxx.tablenamexxx")
收率:
+---+---+------+---+-------+-------------------+---------------+-------------+--------+
| zz|zzz|zzzzzz| zz|zzzzzzz| zzzzz_zzzz|zzzzzzzzzz_zzzz|zz_zzzzzzzzzz|zz_zzzzz|
+---+---+------+---+-------+-------------------+---------------+-------------+--------+
|833| 13| 1| 19| 477|2017-11-00 00000000| null| 0| 29|
|833| 3| 1| 13| 280|2017-11-00 00000000| null| 0| 29|
|833| 9| 1| 13| 442|2017-11-00 00000000| null| 0| 29|
|833| 3| 1| 19| 173|2017-11-00 00000000| null| 0| 29|
|833| 14| 1| 17| 360|2017-11-00 00000000| null| 0| 29|
+---+---+------+---+-------+-------------------+---------------+-------------+--------+
(包括只是为了表明表格没问题)
然后是错误(每隔~2秒重复一次):
[Stage 5:===> (13 + 4) / 200]2018-03-25 01:12:53 WARN DFSClient:611 - Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1252)
at java.lang.Thread.join(Thread.java:1326)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:609)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:370)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:546)
2018-03-25 01:13:04 WARN Persist:96 - Insert of object "org.apache.hadoop.hive.metastore.model.MTable@5e251945" using statement "INSERT INTO TBLS (TBL_ID,CREATE_TIME,VIEW_EXPANDED_TEXT,SD_ID,OWNER,TBL_TYPE,LAST_ACCESS_TIME,VIEW_ORIGINAL_TEXT,TBL_NAME,DB_ID,RETENTION) VALUES (?,?,?,?,?,?,?,?,?,?,?)" failed : Column 'IS_REWRITE_ENABLED' cannot accept a NULL value.
2018-03-25 01:13:04 ERROR RetryingHMSHandler:173 - Retrying HMSHandler after 2000 ms (attempt 1 of 10) with error: javax.jdo.JDODataStoreException: Insert of object "org.apache.hadoop.hive.metastore.model.MTable@5e251945" using statement "INSERT INTO TBLS (TBL_ID,CREATE_TIME,VIEW_EXPANDED_TEXT,SD_ID,OWNER,TBL_TYPE,LAST_ACCESS_TIME,VIEW_ORIGINAL_TEXT,TBL_NAME,DB_ID,RETENTION) VALUES (?,?,?,?,?,?,?,?,?,?,?)" failed : Column 'IS_REWRITE_ENABLED' cannot accept a NULL value.
at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:451)
at org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:732)
at org.datanucleus.api.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:752)
at org.apache.hadoop.hive.metastore.ObjectStore.createTable(ObjectStore.java:814)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:114)
at com.sun.proxy.$Proxy15.createTable(Unknown Source)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1416)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1449)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
at com.sun.proxy.$Proxy17.create_table_with_environment_context(Unknown Source)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.create_table_with_environment_context(HiveMetaStoreClient.java:2050)
at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.create_table_with_environment_context(SessionHiveMetaStoreClient.java:97)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:669)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:657)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156)
at com.sun.proxy.$Proxy18.createTable(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:714)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createTable$1.apply$mcV$sp(HiveClientImpl.scala:468)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createTable$1.apply(HiveClientImpl.scala:466)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createTable$1.apply(HiveClientImpl.scala:466)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:272)
at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:210)
at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:209)
at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:255)
at org.apache.spark.sql.hive.client.HiveClientImpl.createTable(HiveClientImpl.scala:466)
at org.apache.spark.sql.hive.HiveExternalCatalog.saveTableIntoHive(HiveExternalCatalog.scala:479)
at org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$createDataSourceTable(HiveExternalCatalog.scala:367)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply$mcV$sp(HiveExternalCatalog.scala:243)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply(HiveExternalCatalog.scala:216)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply(HiveExternalCatalog.scala:216)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.doCreateTable(HiveExternalCatalog.scala:216)
at org.apache.spark.sql.catalyst.catalog.ExternalCatalog.createTable(ExternalCatalog.scala:119)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:304)
at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:184)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:654)
at org.apache.spark.sql.DataFrameWriter.createTable(DataFrameWriter.scala:458)
at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:437)
at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:393)
at $line29.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:30)
at $line29.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:35)
at $line29.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:37)
at $line29.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:39)
at $line29.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:41)
at $line29.$read$$iw$$iw$$iw$$iw$$iw.<init>(<console>:43)
at $line29.$read$$iw$$iw$$iw$$iw.<init>(<console>:45)
at $line29.$read$$iw$$iw$$iw.<init>(<console>:47)
at $line29.$read$$iw$$iw.<init>(<console>:49)
at $line29.$read$$iw.<init>(<console>:51)
at $line29.$read.<init>(<console>:53)
at $line29.$read$.<init>(<console>:57)
at $line29.$read$.<clinit>(<console>)
at $line29.$eval$.$print$lzycompute(<console>:7)
at $line29.$eval$.$print(<console>:6)
at $line29.$eval.$print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)
at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047)
at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638)
at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637)
at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:637)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:569)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565)
at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:807)
at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:681)
at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:395)
at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:415)
at scala.tools.nsc.interpreter.ILoop$$anonfun$interpretAllFrom$1$$anonfun$apply$5$$anonfun$apply$6.apply(ILoop.scala:427)
at scala.tools.nsc.interpreter.ILoop$$anonfun$interpretAllFrom$1$$anonfun$apply$5$$anonfun$apply$6.apply(ILoop.scala:423)
at scala.reflect.io.Streamable$Chars$class.applyReader(Streamable.scala:111)
at scala.reflect.io.File.applyReader(File.scala:50)
at scala.tools.nsc.interpreter.ILoop$$anonfun$interpretAllFrom$1$$anonfun$apply$5.apply(ILoop.scala:423)
at scala.tools.nsc.interpreter.ILoop$$anonfun$interpretAllFrom$1$$anonfun$apply$5.apply(ILoop.scala:423)
at scala.tools.nsc.interpreter.ILoop.savingReplayStack(ILoop.scala:91)
at scala.tools.nsc.interpreter.ILoop$$anonfun$interpretAllFrom$1.apply(ILoop.scala:422)
at scala.tools.nsc.interpreter.ILoop$$anonfun$interpretAllFrom$1.apply(ILoop.scala:422)
at scala.tools.nsc.interpreter.ILoop.savingReader(ILoop.scala:96)
at scala.tools.nsc.interpreter.ILoop.interpretAllFrom(ILoop.scala:421)
at scala.tools.nsc.interpreter.ILoop$$anonfun$run$3$1.apply(ILoop.scala:577)
at scala.tools.nsc.interpreter.ILoop$$anonfun$run$3$1.apply(ILoop.scala:576)
at scala.tools.nsc.interpreter.ILoop.withFile(ILoop.scala:570)
at scala.tools.nsc.interpreter.ILoop.run$3(ILoop.scala:576)
at scala.tools.nsc.interpreter.ILoop.loadCommand(ILoop.scala:583)
at scala.tools.nsc.interpreter.ILoop$$anonfun$standardCommands$8.apply(ILoop.scala:207)
at scala.tools.nsc.interpreter.ILoop$$anonfun$standardCommands$8.apply(ILoop.scala:207)
at scala.tools.nsc.interpreter.LoopCommands$LineCmd.apply(LoopCommands.scala:62)
at scala.tools.nsc.interpreter.ILoop.colonCommand(ILoop.scala:688)
at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:679)
at scala.tools.nsc.interpreter.ILoop.loadFiles(ILoop.scala:835)
at org.apache.spark.repl.SparkILoop.loadFiles(SparkILoop.scala:111)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:920)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)
at scala.reflect.internal.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:97)
at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:909)
at org.apache.spark.repl.Main$.doMain(Main.scala:76)
at org.apache.spark.repl.Main$.main(Main.scala:56)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
[...]
at org.apache.derby.impl.sql.execute.InsertResultSet.getNextRowCore(Unknown Source)
at org.apache.derby.impl.sql.execute.InsertResultSet.open(Unknown Source)
at org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(Unknown Source)
at org.apache.derby.impl.sql.GenericPreparedStatement.execute(Unknown Source)
... 154 more
(Jezz那些错误堆栈很长)
我猜测最相关的行是:
2018-03-25 01:13:04 WARN Persist:96 - Insert of object
"org.apache.hadoop.hive.metastore.model.MTable@5e251945"
using statement "INSERT INTO TBLS (TBL_ID,CREATE_TIME,
VIEW_EXPANDED_TEXT,SD_ID,OWNER,TBL_TYPE,LAST_ACCESS_TIME,
VIEW_ORIGINAL_TEXT,TBL_NAME,DB_ID,RETENTION) VALUES
(?,?,?,?,?,?,?,?,?,?,?)" failed : Column 'IS_REWRITE_ENABLED'
cannot accept a NULL value.
(我插入的换行符)
整个配置单元设置的损坏:
$ clear ; hive -e "use xxx; show tables;"
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/xxx/bin/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/xxx/bin/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in jar:file:/home/xxx/bin/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true
FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
对数据和一些路径进行了清理。
为了恢复订单,我删除了metastore_db
和derby.log
文件,然后:schematool -initSchema -dbType derby
。
我昨天开始摆弄这个Spark + Hive配置,欢迎任何类型的变通方法。
提前致谢!
答案 0 :(得分:0)
在寻找更多替代方案之后:
指出:apache-hive-1.2.2-bin.tar.gz
。
所以我回到了以下版本:
hadoop-2.7.5.tar.gz
spark-2.3.0-bin-hadoop2.7.tgz
apache-hive-2.3.2-bin.tar.gz
现在它按预期工作。
以前我曾经:
hadoop-3.0.0.tar.gz
spark-2.3.0-bin-hadoop2.7.tgz
var current = $("#doc-menu > li > a").attr("href");
$("#doc-menu > li > a").attr("href", current + "/?child=False");
这是截至本文发布时的最新版本。
祝你好运!
答案 1 :(得分:0)
我遇到了与火花2.3相同的问题。
使用最新的Hive 3.0可以修复此错误,Hive 3.0于2018年5月发布,其中包含SQL修复HIVE-18046,尤其是针对此错误。请注意,此修复程序仅在Hive 3.0和另一个SQL更新程序包中发布,您可以从here查看。在2018年5月之前发布的Hive版本不会包含此修复程序。
如果您没有使用最新的Hive,可能需要手动执行可以解决此错误的SQL。