运行带有spark的配置单元会不断给我这个错误。我已经尝试了蜂巢和Spark的许多不同版本。
在独立模式下运行hadoop 2.7.3 这是一个diy实现 现在使用带有蜂巢2.3.5的spark 2.2
我尝试了不同版本的hive和spark,但不确定是什么问题或如何调试此问题:
0: jdbc:hive2://192.168.71.62:10000> select count(*) from traffic;
Getting log thread is interrupted, since query is done!
Error: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to create spark client.
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:257)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create spark client.
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:64)
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:115)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1232)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:255)
... 11 more
Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Cancel client '9be4c047-285d-4578-a934-7bd51294d240'. Error: Child process exited before connecting back with error log Warning: Ignoring non-spark config property: hive.spark.client.server.connect.timeout=90000
Warning: Ignoring non-spark config property: hive.spark.client.rpc.threads=8
Warning: Ignoring non-spark config property: hive.spark.client.connect.timeout=1000
Warning: Ignoring non-spark config property: hive.spark.client.secret.bits=256
Warning: Ignoring non-spark config property: hive.spark.client.rpc.max.size=52428800
19/05/20 12:39:42 WARN util.Utils: Your hostname, suypc183-OptiPlex-3020 resolves to a loopback address: 127.0.0.1; using 192.168.71.62 instead (on interface enp2s0)
19/05/20 12:39:42 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address
19/05/20 12:39:43 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
19/05/20 12:39:43 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
19/05/20 12:39:43 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
19/05/20 12:39:43 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
19/05/20 12:39:43 INFO yarn.Client: Setting up container launch context for our AM
19/05/20 12:39:43 INFO yarn.Client: Setting up the launch environment for our AM container
19/05/20 12:39:43 INFO yarn.Client: Preparing resources for our AM container
19/05/20 12:39:44 INFO yarn.Client: Deleted staging directory hdfs://localhost:9000/user/anonymous/.sparkStaging/application_1558334426394_0004
Exception in thread "main" java.lang.IllegalArgumentException: Can not create a Path from an empty string
at org.apache.hadoop.fs.Path.checkPathArg(Path.java:126)
at org.apache.hadoop.fs.Path.<init>(Path.java:134)
at org.apache.hadoop.fs.Path.<init>(Path.java:93)
at org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:369)
at org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:490)
at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:529)
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:882)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:171)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1167)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1226)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:744)
at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:169)
at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:167)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
at com.google.common.base.Throwables.propagate(Throwables.java:160)
at org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:125)
at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80)
at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.createRemoteClient(RemoteHiveSparkClient.java:101)
at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:97)
at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:73)
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:62)
... 22 more
以下站点配置为
<property>
<name>hive.exec.scratchdir</name>
<value>/tmp/hive</value>
<description>Scratch space for Hive jobs</description>
</property>
<property>
<name>hive.execution.engine</name>
<value>spark</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>spark.master</name>
<value>yarn</value>
</property>
<property>
<name>spark.executor.memory</name>
<value>2048</value>
</property>
<property>
<name>spark.yarn.archive</name>
<value>hdfs://localhost:8088/user/jars/</value>
</property>
<property>
<name>spark.home</name>
<value>/home/danielphingston/spark</value>
</property>
答案 0 :(得分:0)
Hive与Spark之间最典型的连接问题是确保Spark知道HADOOP CONF目录。我们通过在Spark的spark-env.sh文件中提供以下语句来解决该问题:
HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-$SPARK_CONF_DIR/yarn-conf}
HIVE_CONF_DIR=${HIVE_CONF_DIR:-/etc/hive/conf}
if [ -d "$HIVE_CONF_DIR" ]; then
HADOOP_CONF_DIR="$HADOOP_CONF_DIR:$HIVE_CONF_DIR"
fi
export HADOOP_CONF_DIR
这使Spark可以查看Hadoop目录在文件系统中的位置。