Question

我已成功使用Hive on Spark几周，执行包含聚合数据的CTAS语句。最近，我需要使用直接选择语句（没有GROUP BY）执行CTAS，并且对它失败感到惊讶。我发现我可以在MR上用Hive执行声明就好了。我创建了一个隔离我的问题的测试。

创建源表：

hive -e "

    CREATE DATABASE IF NOT EXISTS testhiveonspark;
    DROP TABLE IF EXISTS testhiveonspark.temptable1;
    CREATE TABLE testhiveonspark.temptable1 (num1 INT, num2 INT)
    STORED AS TEXTFILE;
    INSERT into table testhiveonspark.temptable1 values (1, 1);
    select * from testhiveonspark.temptable1;

"

执行一个有效的CTAS文件（使用SELECT GROUP BY）......这样就完成了：

cat > testhiveonspark_working.hql << EOF

    set mapred.job.queue.name=root.apps10;
    set spark.master=yarn-client;
    set hive.server2.enable.doAs=false;
    set hive.execution.engine=spark;
    set spark.eventLog.enabled=true;
    set spark.shuffle.blockTransferService=nio;
    set spark.eventLog.dir=hdfs://hdfsnode:8020/user/spark/applicationHistory;
    set spark.shuffle.service.enabled=true;
    set spark.dynamicAllocation.enabled=true;

    DROP TABLE IF EXISTS testhiveonspark.temptable2;
    CREATE TABLE testhiveonspark.temptable2
    STORED AS TEXTFILE
    AS SELECT num1, num2 FROM testhiveonspark.temptable1 GROUP BY num1, num2;
    SELECT * FROM testhiveonspark.temptable2;

EOF
hive -f testhiveonspark_working.hql

Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
OK
Time taken: 0.931 seconds
Query ID = jolsen_20161112104040_4e398c2c-4691-47c8-ab79-dfd56195bf2a
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Spark Job = 3f263511-afa9-48ba-b224-1496194d5d3e
Running with YARN Application = application_1478903884683_0019
Kill Command = /usr/lib/hadoop/bin/yarn application -kill application_1478903884683_0019

Query Hive on Spark job[0] stages:
0
1

Status: Running (Hive on Spark job[0])
Job Progress Format
CurrentTime StageId_StageAttemptId: SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount [StageCost]
2016-11-12 10:41:15,453 Stage-0_0: 0/1  Stage-1_0: 0/1
2016-11-12 10:41:18,482 Stage-0_0: 0/1  Stage-1_0: 0/1
2016-11-12 10:41:21,504 Stage-0_0: 0(+1)/1      Stage-1_0: 0/1
2016-11-12 10:41:24,533 Stage-0_0: 1/1 Finished Stage-1_0: 0(+1)/1
2016-11-12 10:41:25,542 Stage-0_0: 1/1 Finished Stage-1_0: 1/1 Finished
Status: Finished successfully in 18.15 seconds
Moving data to: hdfs://hdfsnode:8020/user/hive/warehouse/testhiveonspark.db/temptable2
Table testhiveonspark.temptable2 stats: [numFiles=1, numRows=1, totalSize=4, rawDataSize=3]
OK
Time taken: 83.03 seconds
OK
1       1
Time taken: 0.203 seconds, Fetched: 1 row(s)

执行一个不工作的CTAS文件（仅使用SELECT，没有GROUP BY）......这没有完成：

cat > testhiveonspark_not_working.hql << EOF

    set mapred.job.queue.name=root.apps10;
    set spark.master=yarn-client;
    set hive.server2.enable.doAs=false;
    set hive.execution.engine=spark;
    set spark.eventLog.enabled=true;
    set spark.shuffle.blockTransferService=nio;
    set spark.eventLog.dir=hdfs://HDFSNode:8020/user/spark/applicationHistory;
    set spark.shuffle.service.enabled=true;
    set spark.dynamicAllocation.enabled=true;

    DROP TABLE IF EXISTS testhiveonspark.temptable2;
    CREATE TABLE testhiveonspark.temptable2
    STORED AS TEXTFILE
    AS SELECT num1, num2 FROM testhiveonspark.temptable1;

EOF
hive -f testhiveonspark_not_working.hql

Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
OK
Time taken: 1.041 seconds
Query ID = jolsen_20161112111313_018a2914-6a3b-428c-af7e-81e0a99a302d
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Spark Job = c2affd61-1a39-4765-b3ac-e27bc79562ab
Job hasn't been submitted after 61s. Aborting it.
Possible reasons include network issues, errors in remote driver or the cluster has no available resources, etc.
Please check YARN or Spark driver's logs for further information.
Status: SENT
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.spark.SparkTask

我正在运行版本＆＃34; Hive 1.1.0-cdh5.8.0＆＃34;和Spark＆＃34;版本1.6.0＆＃34;，并认为这可能与我不知道的错误或默认设置有关。 cdh 5.8.2的发行说明显示了一张似乎与我的问题（https://issues.apache.org/jira/browse/HIVE-12650）相关的票证，因此决定将我的集群升级到最新版本cdh5.9.0。不幸的是，我仍然遇到同样的问题。

我＆＃34;思考＆＃34;日志表明Hive在收到Application Master的响应之前超时，但老实说我不确定。我还在调查。我也在使用调试运行配置单元，但没有看到任何有用的东西......日志似乎确认了我已经从错误输出中观察到的内容：＆＃34; hive --hiveconf hive.root。 logger = DEBUG，console -f testhiveonspark_working.hql＆＃34;

失败的原因是什么？两个CTAS声明没有那么大的不同。为什么一个工作而另一个工作失败？

谢谢。

我在调试中运行了我的工作和不工作测试，例如＆＃34; hive --hiveconf hive.root.logger = DEBUG，console -f testhiveonspark_not_working.hql＆＃34;并比较了产出。从三角洲开始，我尝试从日志中选择我认为最相关的部分，我在下面粘贴。我仍然在努力理解为什么＆＃34; Job并没有在61s＆＃34;之后提交，以及这个问题的根源。有帮助吗？谢谢。

 ...

6/11/12 11:42:56 [main]: INFO client.SparkClientImpl: Running client driver with argv: /usr/lib/spark/bin/spark-submit --properties-file /tmp/spark-submit.5047982490856668419.properties --class org.apache.hive.spark.client.RemoteDriver /usr/lib/hive/lib/hive-exec-1.1.0-cdh5.9.0.jar --remote-host HDFSNode --remote-port 42944 --conf hive.spark.client.connect.timeout=1000 --conf hive.spark.client.server.connect.timeout=90000 --conf hive.spark.client.channel.log.level=null --conf hive.spark.client.rpc.max.size=52428800 --conf hive.spark.client.rpc.threads=8 --conf hive.spark.client.secret.bits=256 --conf hive.spark.client.rpc.server.address=null
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: Warning: Ignoring non-spark config property: hive.spark.client.server.connect.timeout=90000
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: Warning: Ignoring non-spark config property: hive.spark.client.rpc.threads=8
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: Warning: Ignoring non-spark config property: hive.spark.client.connect.timeout=1000
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: Warning: Ignoring non-spark config property: hive.spark.client.secret.bits=256
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: Warning: Ignoring non-spark config property: hive.spark.client.rpc.max.size=52428800
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: SLF4J: Class path contains multiple SLF4J bindings.
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: SLF4J: Found binding in [jar:file:/usr/lib/flume-ng/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/11/12 11:42:57 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:57 INFO RemoteDriver: Connecting to: HDFSNode:42944
16/11/12 11:42:58 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:58 WARN Rpc: Invalid log level null, reverting to default.
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG util.ResourceLeakDetector: -Dio.netty.leakDetectionLevel: simple
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG util.Recycler: -Dio.netty.recycler.maxCapacity.default: 262144
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Decoded message of type org.apache.hive.spark.client.rpc.Rpc$SaslMessage (41 bytes)
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG internal.Cleaner0: java.nio.ByteBuffer.cleaner(): available
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.RpcServer$SaslServerHandler: Handling SASL challenge message...
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.RpcServer$SaslServerHandler: Sending SASL challenge response...
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Encoded message of type org.apache.hive.spark.client.rpc.Rpc$SaslMessage (98 bytes)
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Decoded message of type org.apache.hive.spark.client.rpc.Rpc$SaslMessage (275 bytes)
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.RpcServer$SaslServerHandler: Handling SASL challenge message...
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.RpcServer$SaslServerHandler: Sending SASL challenge response...
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Encoded message of type org.apache.hive.spark.client.rpc.Rpc$SaslMessage (45 bytes)
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.RpcServer$SaslServerHandler: SASL negotiation finished with QOP auth.
16/11/12 11:42:58 [main]: DEBUG session.SparkSessionManagerImpl: New session (23ef000b-8281-4c2d-a1db-077a3d40366f) is created.
16/11/12 11:42:58 [main]: INFO log.PerfLogger: <PERFLOG method=SparkSubmitJob from=org.apache.hadoop.hive.ql.exec.spark.SparkTask>
16/11/12 11:42:58 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:58 INFO SparkContext: Running Spark version 1.6.0
16/11/12 11:42:58 [main]: DEBUG rpc.RpcDispatcher: [ClientProtocol] Registered outstanding rpc 0 (org.apache.hive.spark.client.BaseProtocol$SyncJobRequest).
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Encoded message of type org.apache.hive.spark.client.rpc.Rpc$MessageHeader (5 bytes)
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Encoded message of type org.apache.hive.spark.client.BaseProtocol$SyncJobRequest (168 bytes)
16/11/12 11:42:58 [main]: DEBUG hdfs.DFSClient: /tmp/hive/jolsen/c1dcb42a-8312-42df-a3c2-c1c359e10011/hive_2016-11-12_11-42-54_731_1054170393347011832-1: masked=rwx------
16/11/12 11:42:58 [IPC Parameter Sending Thread #0]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen sending #22
16/11/12 11:42:58 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen got value #22
16/11/12 11:42:58 [main]: DEBUG ipc.ProtobufRpcEngine: Call: mkdirs took 9ms
16/11/12 11:42:58 [IPC Parameter Sending Thread #0]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen sending #23
16/11/12 11:42:58 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen got value #23
16/11/12 11:42:58 [main]: DEBUG ipc.ProtobufRpcEngine: Call: getFileInfo took 1ms
16/11/12 11:42:58 [main]: INFO ql.Context: New scratch dir is hdfs://HDFSNode:8020/tmp/hive/jolsen/c1dcb42a-8312-42df-a3c2-c1c359e10011/hive_2016-11-12_11-42-54_731_1054170393347011832-1
16/11/12 11:42:58 [main]: DEBUG hdfs.DFSClient: /tmp/hive/jolsen/c1dcb42a-8312-42df-a3c2-c1c359e10011/hive_2016-11-12_11-42-54_731_1054170393347011832-1/-mr-10003: masked=rwxr-xr-x
16/11/12 11:42:58 [IPC Parameter Sending Thread #0]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen sending #24
16/11/12 11:42:58 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen got value #24
16/11/12 11:42:58 [main]: DEBUG ipc.ProtobufRpcEngine: Call: mkdirs took 5ms
16/11/12 11:42:58 [main]: DEBUG rpc.RpcDispatcher: [ClientProtocol] Registered outstanding rpc 1 (org.apache.hive.spark.client.BaseProtocol$JobRequest).
16/11/12 11:42:58 [main]: DEBUG client.SparkClientImpl: Send JobRequest[3e50626e-3a1e-47d0-a18c-76869e1277ae].
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Encoded message of type org.apache.hive.spark.client.rpc.Rpc$MessageHeader (5 bytes)
16/11/12 11:42:58 [RPC-Handler-3]: DEBUG rpc.KryoMessageCodec: Encoded message of type org.apache.hive.spark.client.BaseProtocol$JobRequest (174708 bytes)
16/11/12 11:42:58 [main]: INFO log.PerfLogger: </PERFLOG method=SparkSubmitJob start=1478968978795 end=1478968978987 duration=192 from=org.apache.hadoop.hive.ql.exec.spark.SparkTask>
Starting Spark Job = 3e50626e-3a1e-47d0-a18c-76869e1277ae
16/11/12 11:42:58 [main]: INFO exec.Task: Starting Spark Job = 3e50626e-3a1e-47d0-a18c-76869e1277ae
16/11/12 11:42:58 [main]: INFO log.PerfLogger: <PERFLOG method=SparkRunJob from=org.apache.hadoop.hive.ql.exec.spark.status.SparkJobMonitor>
16/11/12 11:42:58 [main]: INFO log.PerfLogger: <PERFLOG method=SparkSubmitToRunning from=org.apache.hadoop.hive.ql.exec.spark.status.SparkJobMonitor>
state = SENT
16/11/12 11:42:58 [main]: INFO status.SparkJobMonitor: state = SENT
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO SecurityManager: Changing view acls to: jolsen
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO SecurityManager: Changing modify acls to: jolsen
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jolsen); users with modify permissions: Set(jolsen)
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO Utils: Successfully started service 'sparkDriver' on port 43254.
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO Slf4jLogger: Slf4jLogger started
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO Remoting: Starting remoting
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.1.99:53727]
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriverActorSystem@192.168.1.99:53727]
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 53727.
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO SparkEnv: Registering MapOutputTracker
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO SparkEnv: Registering BlockManagerMaster
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-1353a7a8-14f0-4b9d-b369-1224fbc9b331
16/11/12 11:42:59 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:42:59 INFO MemoryStore: MemoryStore started with capacity 530.3 MB
state = SENT
16/11/12 11:42:59 [main]: INFO status.SparkJobMonitor: state = SENT
16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO SparkEnv: Registering OutputCommitCoordinator
16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO SparkUI: Started SparkUI at http://192.168.1.99:4040
16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO SparkContext: Added JAR file:/usr/lib/hive/lib/hive-exec-1.1.0-cdh5.9.0.jar at spark://192.168.1.99:43254/jars/hive-exec-1.1.0-cdh5.9.0.jar with timestamp 1478968980504
16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO RMProxy: Connecting to ResourceManager at HDFSNode/192.168.1.99:8032
16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO Client: Requesting a new application from cluster with 10 NodeManagers
state = SENT
16/11/12 11:43:00 [main]: INFO status.SparkJobMonitor: state = SENT
16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (20480 MB per container)
16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
16/11/12 11:43:00 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:00 INFO Client: Setting up container launch context for our AM
16/11/12 11:43:01 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:01 INFO Client: Setting up the launch environment for our AM container
16/11/12 11:43:01 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:01 INFO Client: Preparing resources for our AM container
16/11/12 11:43:01 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:01 INFO Client: Uploading resource file:/usr/lib/spark/lib/spark-assembly-1.6.0-cdh5.9.0-hadoop2.6.0-cdh5.9.0.jar -> hdfs://HDFSNode:8020/user/jolsen/.sparkStaging/application_1478903884683_0032/spark-assembly-1.6.0-cdh5.9.0-hadoop2.6.0-cdh5.9.0.jar
state = SENT
16/11/12 11:43:01 [main]: INFO status.SparkJobMonitor: state = SENT
16/11/12 11:43:02 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:02 INFO Client: Uploading resource file:/tmp/spark-695bd22c-574c-41ff-9e38-fa491f09c3f6/__spark_conf__2574250824979235523.zip -> hdfs://HDFSNode:8020/user/jolsen/.sparkStaging/application_1478903884683_0032/__spark_conf__2574250824979235523.zip
state = SENT
16/11/12 11:43:02 [main]: INFO status.SparkJobMonitor: state = SENT
16/11/12 11:43:03 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:03 INFO SecurityManager: Changing view acls to: jolsen
16/11/12 11:43:03 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:03 INFO SecurityManager: Changing modify acls to: jolsen
16/11/12 11:43:03 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:03 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jolsen); users with modify permissions: Set(jolsen)
16/11/12 11:43:03 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:03 INFO Client: Submitting application 32 to ResourceManager
16/11/12 11:43:03 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:03 INFO YarnClientImpl: Submitted application application_1478903884683_0032
state = SENT
16/11/12 11:43:04 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:05 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:06 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:07 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:08 [main]: INFO status.SparkJobMonitor: state = SENT
16/11/12 11:43:08 [main-SendThread(HDFSNode05:2181)]: DEBUG zookeeper.ClientCnxn: Got ping response for sessionid: 0x5585583fa3e000e after 0ms
16/11/12 11:43:08 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen: closed
16/11/12 11:43:08 [IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen]: DEBUG ipc.Client: IPC Client (737904169) connection to HDFSNode/192.168.1.99:8020 from jolsen: stopped, remaining connections 0
state = SENT
16/11/12 11:43:09 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:10 [main]: INFO status.SparkJobMonitor: state = SENT
16/11/12 11:43:10 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:10 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(null)
16/11/12 11:43:10 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:10 INFO YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> HDFSNode, PROXY_URI_BASES -> http://HDFSNode:8151/proxy/application_1478903884683_0032), /proxy/application_1478903884683_0032
16/11/12 11:43:10 [stderr-redir-1]: INFO client.SparkClientImpl: 16/11/12 11:43:10 INFO JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
state = SENT
16/11/12 11:43:11 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:12 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:13 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:14 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:15 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:16 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:17 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:18 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:19 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:20 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:21 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:22 [main]: INFO status.SparkJobMonitor: state = SENT
16/11/12 11:43:22 [main-SendThread(HDFSNode05:2181)]: DEBUG zookeeper.ClientCnxn: Got ping response for sessionid: 0x5585583fa3e000e after 0ms
state = SENT
16/11/12 11:43:23 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:24 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:25 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:26 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:27 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:28 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:29 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:30 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:31 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:32 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:33 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:34 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:35 [main]: INFO status.SparkJobMonitor: state = SENT
16/11/12 11:43:35 [main-SendThread(HDFSNode05:2181)]: DEBUG zookeeper.ClientCnxn: Got ping response for sessionid: 0x5585583fa3e000e after 0ms
state = SENT
16/11/12 11:43:36 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:37 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:38 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:39 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:40 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:41 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:42 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:43 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:44 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:45 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:46 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:47 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:48 [main]: INFO status.SparkJobMonitor: state = SENT
16/11/12 11:43:48 [main-SendThread(HDFSNode05:2181)]: DEBUG zookeeper.ClientCnxn: Got ping response for sessionid: 0x5585583fa3e000e after 0ms
state = SENT
16/11/12 11:43:49 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:50 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:51 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:52 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:53 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:54 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:55 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:56 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:57 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:58 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:43:59 [main]: INFO status.SparkJobMonitor: state = SENT
state = SENT
16/11/12 11:44:00 [main]: INFO status.SparkJobMonitor: state = SENT
Job hasn't been submitted after 61s. Aborting it.
Possible reasons include network issues, errors in remote driver or the cluster has no available resources, etc.
Please check YARN or Spark driver's logs for further information.
16/11/12 11:44:00 [main]: ERROR status.SparkJobMonitor: Job hasn't been submitted after 61s. Aborting it.
Possible reasons include network issues, errors in remote driver or the cluster has no available resources, etc.
Please check YARN or Spark driver's logs for further information.
Status: SENT
16/11/12 11:44:00 [main]: ERROR status.SparkJobMonitor: Status: SENT
16/11/12 11:44:00 [main]: INFO log.PerfLogger: </PERFLOG method=SparkRunJob start=1478968978989 end=1478969040067 duration=61078 from=org.apache.hadoop.hive.ql.exec.spark.status.SparkJobMonitor>
16/11/12 11:44:00 [main]: DEBUG rpc.RpcDispatcher: [ClientProtocol] Registered outstanding rpc 2 (org.apache.hive.spark.client.BaseProtocol$CancelJob).

Answer 1

我挖掘了Hive源代码，找到了日志消息的来源（例如“61s后没有提交作业”）我看到输出到控制台。从代码中，我能够找到一个属性“hive.spark.job.monitor.timeout”，它默认为60s，几乎是我的工作超时的确切时间，因此被认为必须是正确的属性。我再次尝试了我的工作，每次增加“hive.spark.job.monitor.timeout”，并且在将其增加到“180s”后，我的工作最终在超时之前执行。问题解决了。

我不知道为什么我的工作应该在3分钟内实际执行，这似乎是一个非常长的延迟，但我会把这项研究留到另一个时间。

这是我的最终代码，有效：

sed '$!{N;/^[a-zA-Z ][^\n]\+\n[a-zA-Z ]/{s/\n/ /}}'

Spark CTAS上的Hive使用Straight SELECT失败但成功使用SELECT GROUP BY

1 个答案: