我在将Spark 2.4从在kubernetes中运行的Docker容器连接到dataproc集群(使用Spark 2.4)时遇到问题。我正在为本地kubernetes主机名获取“ java.net.UnknownHostException”。此网络配置可与Spark 2.2一起使用,因此Spark进行主机名解析的方式似乎有所改变。
如果我修改/ etc / hosts,以便主机名和kubernetes主机名指向127.0.0.1,它将显示警告。但是,这不是我不满意的解决方法,因为它将覆盖kubernetes默认条目并可能影响其他事情。
错误堆栈:
(base) appuser@jupyter-21001-0:/opt/notebooks/Test$ pyspark
Python 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
2019-02-01 18:47:25 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
2019-02-01 18:47:27 WARN Client:66 - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
2019-02-01 18:47:59 ERROR YarnClientSchedulerBackend:70 - YARN application has exited unexpectedly with state FAILED! Check the YARN application logs for more details.
2019-02-01 18:47:59 ERROR YarnClientSchedulerBackend:70 - Diagnostics message: Uncaught exception: org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:109)
at org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:514)
at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:307)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:773)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:772)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:797)
at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:827)
at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
Caused by: java.io.IOException: Failed to connect to jupyter-21001-0.jupyter-21001.75dev-a123456.svc.cluster.local:38831
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:198)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: jupyter-21001-0.jupyter-21001.75dev-a123456.svc.cluster.local
at java.net.InetAddress.getAllByName0(InetAddress.java:1281)
at java.net.InetAddress.getAllByName(InetAddress.java:1193)
at java.net.InetAddress.getAllByName(InetAddress.java:1127)
at java.net.InetAddress.getByName(InetAddress.java:1077)
at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:146)
at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:143)
at java.security.AccessController.doPrivileged(Native Method)
at io.netty.util.internal.SocketUtils.addressByName(SocketUtils.java:143)
at io.netty.resolver.DefaultNameResolver.doResolve(DefaultNameResolver.java:43)
at io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:63)
at io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:55)
at io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:57)
at io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:32)
at io.netty.resolver.AbstractAddressResolver.resolve(AbstractAddressResolver.java:108)
at io.netty.bootstrap.Bootstrap.doResolveAndConnect0(Bootstrap.java:208)
at io.netty.bootstrap.Bootstrap.access$000(Bootstrap.java:49)
at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:188)
at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:174)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420)
at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104)
at io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82)
at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetSuccess(AbstractChannel.java:978)
at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:512)
at io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:423)
at io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:482)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
... 1 more
2019-02-01 18:47:59 WARN YarnSchedulerBackend$YarnSchedulerEndpoint:66 - Attempted to request executors before the AM has registered!
2019-02-01 18:47:59 ERROR SparkContext:91 - Error initializing SparkContext.
java.lang.IllegalStateException: Spark context stopped while waiting for backend
at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:745)
at org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:191)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:560)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:745)
/opt/spark/python/pyspark/shell.py:45: UserWarning: Failed to initialize Spark session.
warnings.warn("Failed to initialize Spark session.")
Traceback (most recent call last):
File "/opt/spark/python/pyspark/shell.py", line 41, in <module>
spark = SparkSession._create_shell_session()
File "/opt/spark/python/pyspark/sql/session.py", line 583, in _create_shell_session
return SparkSession.builder.getOrCreate()
File "/opt/spark/python/pyspark/sql/session.py", line 173, in getOrCreate
sc = SparkContext.getOrCreate(sparkConf)
File "/opt/spark/python/pyspark/context.py", line 349, in getOrCreate
SparkContext(conf=conf or SparkConf())
File "/opt/spark/python/pyspark/context.py", line 118, in __init__
conf, jsc, profiler_cls)
File "/opt/spark/python/pyspark/context.py", line 180, in _do_init
self._jsc = jsc or self._initialize_context(self._conf._jconf)
File "/opt/spark/python/pyspark/context.py", line 288, in _initialize_context
return self._jvm.JavaSparkContext(jconf)
File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1525, in __call__
answer, self._gateway_client, None, self._fqn)
File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.IllegalStateException: Spark context stopped while waiting for backend
at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:745)
at org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:191)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:560)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:745)
火花版本:
(base) appuser@jupyter-21001-0:/opt/notebooks/Test$ pyspark --version
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.4.0
/_/
Using Scala version 2.11.12, OpenJDK 64-Bit Server VM, 1.8.0_152-release
Branch
Compiled by user on 2018-10-29T06:22:05Z
Revision
Url
Type --help for more information.
主机文件:
# Kubernetes-managed hosts file.
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
fe00::0 ip6-mcastprefix
fe00::1 ip6-allnodes
fe00::2 ip6-allrouters
172.18.168.47 jupyter-21001-0.jupyter-21001.75dev-a123456.svc.cluster.local jupyter-21001-0
spark-defaults.conf
spark.master yarn
spark.driver.extraClassPath /opt/spark/lib/gcs-connector-latest-hadoop2.jar
spark.executor.extraClassPath /usr/lib/hadoop/lib/gcs-connector.jar:/usr/lib/hadoop/lib/bigquery-connector.jar
core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://jupyter-21001-m</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://jupyter-21001-m</value>
</property>
<property>
<name>fs.gs.impl</name>
<value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem</value>
<description>The FileSystem for gs: (GCS) uris.</description>
</property>
<property>
<name>fs.AbstractFileSystem.gs.impl</name>
<value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS</value>
<description>The AbstractFileSystem for gs: (GCS) uris. Only necessary for use with Hadoop 2.</description>
</property>
<property>
<name>fs.gs.project.id</name>
<value>NOT_RUNNING_INSIDE_GCE</value>
</property>
<property>
<name>google.cloud.auth.service.account.enable</name>
<value>true</value>
</property>
<property>
<name>google.cloud.auth.service.account.json.keyfile</name>
<value>/oauth/auth.json</value>
</property>
<property>
<name>fs.gs.implicit.dir.repair.enable</name>
<value>false</value>
<description>
Whether or not to create objects for the parent directories of objects
with / in their path e.g. creating gs://bucket/foo/ upon finding
gs://bucket/foo/bar.
</description>
</property>
</configuration>
答案 0 :(得分:1)
这听起来很难。在较高的级别上,我建议您先看看如何运行Jupyter notebooks inside of Dataproc,但我将尝试回答这个问题。
在YARN客户端模式下(pyspark shell使用)。 AppMaster需要拨回驱动程序,因此您需要将spark.driver.host
和spark.driver.port
设置为AppMaster可以与之交谈的内容。假设它与您的Dataproc群集位于同一GCE网络中的GKE上,NodePort可能是将驱动程序暴露于GCE网络的最佳选择。
我不知道的部分是将节点IP和端口传递到pyspark pod,以便您可以正确设置spark.driver.host
和spark.driver.port
(GKE专家请注意这一点)。
我不了解您的设置详细信息,但是我自己不会编辑GKE VM /etc/hosts
。我也感到很奇怪,它介于2.2和2.4之间(是否有可能实际上没有在2.2设置的Dataproc上的YARN上运行?)。
最后您没有提到它,但是您需要在core-site.xml或yarn.site.xml中设置yarn.resourcemanager.hostname=jupyter-21001-m
。
作为部分调用。这应该可以在Dataproc中成功创建YARN AppMaster(然后将无法回拨到GKE):
kubectl run spark -i -t --image gcr.io/spark-operator/spark:v2.4.0 \
--generator=run-pod/v1 --rm --restart=Never --env HADOOP_CONF_DIR=/tmp \
--command -- /opt/spark/bin/spark-submit \
--master yarn \
--conf spark.hadoop.fs.defaultFS=hdfs://jupyter-21001-m \
--conf spark.hadoop.yarn.resourcemanager.hostname=jupyter-21001-m \
--class org.apache.spark.examples.SparkPi \
/opt/spark/examples/jars/spark-examples_2.11-2.4.0.jar
抱歉,答案不完整,如果我填写缺失的部分,我会进行更新。