将Spark DataFrame插入Hive表会导致Hive设置损坏

时间:2018-03-25 04:27:32

标签: apache-spark hive spark-dataframe

摘要:尝试将Spark DataFrame插入到hive表中会导致错误循环和数据库损坏。

详细说明:

  1. 错误循环:

    df.show(5)
    df.write.mode(SaveMode.Overwrite).saveAsTable("dbnamexxx.tablenamexxx")
    

    收率:

    +---+---+------+---+-------+-------------------+---------------+-------------+--------+
    | zz|zzz|zzzzzz| zz|zzzzzzz|         zzzzz_zzzz|zzzzzzzzzz_zzzz|zz_zzzzzzzzzz|zz_zzzzz|
    +---+---+------+---+-------+-------------------+---------------+-------------+--------+
    |833| 13|     1| 19|    477|2017-11-00 00000000|           null|            0|      29|
    |833|  3|     1| 13|    280|2017-11-00 00000000|           null|            0|      29|
    |833|  9|     1| 13|    442|2017-11-00 00000000|           null|            0|      29|
    |833|  3|     1| 19|    173|2017-11-00 00000000|           null|            0|      29|
    |833| 14|     1| 17|    360|2017-11-00 00000000|           null|            0|      29|
    +---+---+------+---+-------+-------------------+---------------+-------------+--------+
    

    (包括只是为了表明表格没问题)

    然后是错误(每隔~2秒重复一次):

    [Stage 5:===>                                                    (13 + 4) / 200]2018-03-25 01:12:53 WARN  DFSClient:611 - Caught exception 
    java.lang.InterruptedException
        at java.lang.Object.wait(Native Method)
        at java.lang.Thread.join(Thread.java:1252)
        at java.lang.Thread.join(Thread.java:1326)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:609)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:370)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:546)
    2018-03-25 01:13:04 WARN  Persist:96 - Insert of object "org.apache.hadoop.hive.metastore.model.MTable@5e251945" using statement "INSERT INTO TBLS (TBL_ID,CREATE_TIME,VIEW_EXPANDED_TEXT,SD_ID,OWNER,TBL_TYPE,LAST_ACCESS_TIME,VIEW_ORIGINAL_TEXT,TBL_NAME,DB_ID,RETENTION) VALUES (?,?,?,?,?,?,?,?,?,?,?)" failed : Column 'IS_REWRITE_ENABLED'  cannot accept a NULL value.
    2018-03-25 01:13:04 ERROR RetryingHMSHandler:173 - Retrying HMSHandler after 2000 ms (attempt 1 of 10) with error: javax.jdo.JDODataStoreException: Insert of object "org.apache.hadoop.hive.metastore.model.MTable@5e251945" using statement "INSERT INTO TBLS (TBL_ID,CREATE_TIME,VIEW_EXPANDED_TEXT,SD_ID,OWNER,TBL_TYPE,LAST_ACCESS_TIME,VIEW_ORIGINAL_TEXT,TBL_NAME,DB_ID,RETENTION) VALUES (?,?,?,?,?,?,?,?,?,?,?)" failed : Column 'IS_REWRITE_ENABLED'  cannot accept a NULL value.
        at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:451)
        at org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:732)
        at org.datanucleus.api.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:752)
        at org.apache.hadoop.hive.metastore.ObjectStore.createTable(ObjectStore.java:814)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:114)
        at com.sun.proxy.$Proxy15.createTable(Unknown Source)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1416)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1449)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
        at com.sun.proxy.$Proxy17.create_table_with_environment_context(Unknown Source)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.create_table_with_environment_context(HiveMetaStoreClient.java:2050)
        at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.create_table_with_environment_context(SessionHiveMetaStoreClient.java:97)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:669)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:657)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156)
        at com.sun.proxy.$Proxy18.createTable(Unknown Source)
        at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:714)
        at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createTable$1.apply$mcV$sp(HiveClientImpl.scala:468)
        at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createTable$1.apply(HiveClientImpl.scala:466)
        at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createTable$1.apply(HiveClientImpl.scala:466)
        at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:272)
        at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:210)
        at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:209)
        at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:255)
        at org.apache.spark.sql.hive.client.HiveClientImpl.createTable(HiveClientImpl.scala:466)
        at org.apache.spark.sql.hive.HiveExternalCatalog.saveTableIntoHive(HiveExternalCatalog.scala:479)
        at org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$createDataSourceTable(HiveExternalCatalog.scala:367)
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply$mcV$sp(HiveExternalCatalog.scala:243)
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply(HiveExternalCatalog.scala:216)
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply(HiveExternalCatalog.scala:216)
        at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
        at org.apache.spark.sql.hive.HiveExternalCatalog.doCreateTable(HiveExternalCatalog.scala:216)
        at org.apache.spark.sql.catalyst.catalog.ExternalCatalog.createTable(ExternalCatalog.scala:119)
        at org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:304)
        at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:184)
        at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
        at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
        at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
        at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)
        at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)
        at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
        at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:654)
        at org.apache.spark.sql.DataFrameWriter.createTable(DataFrameWriter.scala:458)
        at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:437)
        at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:393)
        at $line29.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:30)
        at $line29.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:35)
        at $line29.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:37)
        at $line29.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:39)
        at $line29.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:41)
        at $line29.$read$$iw$$iw$$iw$$iw$$iw.<init>(<console>:43)
        at $line29.$read$$iw$$iw$$iw$$iw.<init>(<console>:45)
        at $line29.$read$$iw$$iw$$iw.<init>(<console>:47)
        at $line29.$read$$iw$$iw.<init>(<console>:49)
        at $line29.$read$$iw.<init>(<console>:51)
        at $line29.$read.<init>(<console>:53)
        at $line29.$read$.<init>(<console>:57)
        at $line29.$read$.<clinit>(<console>)
        at $line29.$eval$.$print$lzycompute(<console>:7)
        at $line29.$eval$.$print(<console>:6)
        at $line29.$eval.$print(<console>)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)
        at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047)
        at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638)
        at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637)
        at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
        at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
        at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:637)
        at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:569)
        at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565)
        at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:807)
        at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:681)
        at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:395)
        at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:415)
        at scala.tools.nsc.interpreter.ILoop$$anonfun$interpretAllFrom$1$$anonfun$apply$5$$anonfun$apply$6.apply(ILoop.scala:427)
        at scala.tools.nsc.interpreter.ILoop$$anonfun$interpretAllFrom$1$$anonfun$apply$5$$anonfun$apply$6.apply(ILoop.scala:423)
        at scala.reflect.io.Streamable$Chars$class.applyReader(Streamable.scala:111)
        at scala.reflect.io.File.applyReader(File.scala:50)
        at scala.tools.nsc.interpreter.ILoop$$anonfun$interpretAllFrom$1$$anonfun$apply$5.apply(ILoop.scala:423)
        at scala.tools.nsc.interpreter.ILoop$$anonfun$interpretAllFrom$1$$anonfun$apply$5.apply(ILoop.scala:423)
        at scala.tools.nsc.interpreter.ILoop.savingReplayStack(ILoop.scala:91)
        at scala.tools.nsc.interpreter.ILoop$$anonfun$interpretAllFrom$1.apply(ILoop.scala:422)
        at scala.tools.nsc.interpreter.ILoop$$anonfun$interpretAllFrom$1.apply(ILoop.scala:422)
        at scala.tools.nsc.interpreter.ILoop.savingReader(ILoop.scala:96)
        at scala.tools.nsc.interpreter.ILoop.interpretAllFrom(ILoop.scala:421)
        at scala.tools.nsc.interpreter.ILoop$$anonfun$run$3$1.apply(ILoop.scala:577)
        at scala.tools.nsc.interpreter.ILoop$$anonfun$run$3$1.apply(ILoop.scala:576)
        at scala.tools.nsc.interpreter.ILoop.withFile(ILoop.scala:570)
        at scala.tools.nsc.interpreter.ILoop.run$3(ILoop.scala:576)
        at scala.tools.nsc.interpreter.ILoop.loadCommand(ILoop.scala:583)
        at scala.tools.nsc.interpreter.ILoop$$anonfun$standardCommands$8.apply(ILoop.scala:207)
        at scala.tools.nsc.interpreter.ILoop$$anonfun$standardCommands$8.apply(ILoop.scala:207)
        at scala.tools.nsc.interpreter.LoopCommands$LineCmd.apply(LoopCommands.scala:62)
        at scala.tools.nsc.interpreter.ILoop.colonCommand(ILoop.scala:688)
        at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:679)
        at scala.tools.nsc.interpreter.ILoop.loadFiles(ILoop.scala:835)
        at org.apache.spark.repl.SparkILoop.loadFiles(SparkILoop.scala:111)
        at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:920)
        at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)
        at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)
        at scala.reflect.internal.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:97)
        at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:909)
        at org.apache.spark.repl.Main$.doMain(Main.scala:76)
        at org.apache.spark.repl.Main$.main(Main.scala:56)
        at org.apache.spark.repl.Main.main(Main.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    
        [...]
    
        at org.apache.derby.impl.sql.execute.InsertResultSet.getNextRowCore(Unknown Source)
        at org.apache.derby.impl.sql.execute.InsertResultSet.open(Unknown Source)
        at org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(Unknown Source)
        at org.apache.derby.impl.sql.GenericPreparedStatement.execute(Unknown Source)
        ... 154 more
    

    (Jezz那些错误堆栈很长)

    我猜测最相关的行是:

        2018-03-25 01:13:04 WARN  Persist:96 - Insert of object
        "org.apache.hadoop.hive.metastore.model.MTable@5e251945"
        using statement "INSERT INTO TBLS (TBL_ID,CREATE_TIME,
        VIEW_EXPANDED_TEXT,SD_ID,OWNER,TBL_TYPE,LAST_ACCESS_TIME,
        VIEW_ORIGINAL_TEXT,TBL_NAME,DB_ID,RETENTION) VALUES
        (?,?,?,?,?,?,?,?,?,?,?)" failed : Column 'IS_REWRITE_ENABLED'
        cannot accept a NULL value.
    

    (我插入的换行符)

  2. 整个配置单元设置的损坏:

    $ clear ; hive -e "use xxx; show tables;"                 
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/home/xxx/bin/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/home/xxx/bin/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
    
    Logging initialized using configuration in jar:file:/home/xxx/bin/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true
    FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    
  3. 对数据和一些路径进行了清理。

    为了恢复订单,我删除了metastore_dbderby.log文件,然后:schematool -initSchema -dbType derby

    我昨天开始摆弄这个Spark + Hive配置,欢迎任何类型的变通方法。

    提前致谢!

2 个答案:

答案 0 :(得分:0)

在寻找更多替代方案之后:

Cloudera Hive on Spark 2.x?

指出:apache-hive-1.2.2-bin.tar.gz

所以我回到了以下版本:

  • hadoop-2.7.5.tar.gz

  • spark-2.3.0-bin-hadoop2.7.tgz

  • apache-hive-2.3.2-bin.tar.gz

现在它按预期工作。

以前我曾经:

  • hadoop-3.0.0.tar.gz

  • spark-2.3.0-bin-hadoop2.7.tgz

  • var current = $("#doc-menu > li > a").attr("href"); $("#doc-menu > li > a").attr("href", current + "/?child=False");

这是截至本文发布时的最新版本。

祝你好运!

答案 1 :(得分:0)

我遇到了与火花2.3相同的问题。

使用最新的Hive 3.0可以修复此错误,Hive 3.0于2018年5月发布,其中包含SQL修复HIVE-18046,尤其是针对此错误。请注意,此修复程序仅在Hive 3.0和另一个SQL更新程序包中发布,您可以从here查看。在2018年5月之前发布的Hive版本不会包含此修复程序。

如果您没有使用最新的Hive,可能需要手动执行可以解决此错误的SQL。