Spark 2.4从容器连接到dataproc:java.net.UnknownHostException

时间:2019-02-04 17:13:34

标签: apache-spark google-kubernetes-engine google-cloud-dataproc

我在将Spark 2.4从在kubernetes中运行的Docker容器连接到dataproc集群(使用Spark 2.4)时遇到问题。我正在为本地kubernetes主机名获取“ java.net.UnknownHostException”。此网络配置可与Spark 2.2一起使用,因此Spark进行主机名解析的方式似乎有所改变。

如果我修改/ etc / hosts,以便主机名和kubernetes主机名指向127.0.0.1,它将显示警告。但是,这不是我不满意的解决方法,因为它将覆盖kubernetes默认条目并可能影响其他事情。

错误堆栈:

(base) appuser@jupyter-21001-0:/opt/notebooks/Test$ pyspark
Python 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
2019-02-01 18:47:25 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
2019-02-01 18:47:27 WARN  Client:66 - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
2019-02-01 18:47:59 ERROR YarnClientSchedulerBackend:70 - YARN application has exited unexpectedly with state FAILED! Check the YARN application logs for more details.
2019-02-01 18:47:59 ERROR YarnClientSchedulerBackend:70 - Diagnostics message: Uncaught exception: org.apache.spark.SparkException: Exception thrown in awaitResult:
        at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
        at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
        at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
        at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:109)
        at org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:514)
        at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:307)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:773)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
        at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:772)
        at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
        at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:797)
        at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:827)
        at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
Caused by: java.io.IOException: Failed to connect to jupyter-21001-0.jupyter-21001.75dev-a123456.svc.cluster.local:38831
        at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
        at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
        at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:198)
        at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
        at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: jupyter-21001-0.jupyter-21001.75dev-a123456.svc.cluster.local
        at java.net.InetAddress.getAllByName0(InetAddress.java:1281)
        at java.net.InetAddress.getAllByName(InetAddress.java:1193)
        at java.net.InetAddress.getAllByName(InetAddress.java:1127)
        at java.net.InetAddress.getByName(InetAddress.java:1077)
        at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:146)
        at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:143)
        at java.security.AccessController.doPrivileged(Native Method)
        at io.netty.util.internal.SocketUtils.addressByName(SocketUtils.java:143)
        at io.netty.resolver.DefaultNameResolver.doResolve(DefaultNameResolver.java:43)
        at io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:63)
        at io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:55)
        at io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:57)
        at io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:32)
        at io.netty.resolver.AbstractAddressResolver.resolve(AbstractAddressResolver.java:108)
        at io.netty.bootstrap.Bootstrap.doResolveAndConnect0(Bootstrap.java:208)
        at io.netty.bootstrap.Bootstrap.access$000(Bootstrap.java:49)
        at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:188)
        at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:174)
        at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
        at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)
        at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420)
        at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104)
        at io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82)
        at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetSuccess(AbstractChannel.java:978)
        at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:512)
        at io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:423)
        at io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:482)
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
        ... 1 more
 
2019-02-01 18:47:59 WARN  YarnSchedulerBackend$YarnSchedulerEndpoint:66 - Attempted to request executors before the AM has registered!
2019-02-01 18:47:59 ERROR SparkContext:91 - Error initializing SparkContext.
java.lang.IllegalStateException: Spark context stopped while waiting for backend
        at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:745)
        at org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:191)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:560)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:238)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:745)
/opt/spark/python/pyspark/shell.py:45: UserWarning: Failed to initialize Spark session.
  warnings.warn("Failed to initialize Spark session.")
Traceback (most recent call last):
  File "/opt/spark/python/pyspark/shell.py", line 41, in <module>
    spark = SparkSession._create_shell_session()
  File "/opt/spark/python/pyspark/sql/session.py", line 583, in _create_shell_session
    return SparkSession.builder.getOrCreate()
  File "/opt/spark/python/pyspark/sql/session.py", line 173, in getOrCreate
    sc = SparkContext.getOrCreate(sparkConf)
  File "/opt/spark/python/pyspark/context.py", line 349, in getOrCreate
    SparkContext(conf=conf or SparkConf())
  File "/opt/spark/python/pyspark/context.py", line 118, in __init__
    conf, jsc, profiler_cls)
  File "/opt/spark/python/pyspark/context.py", line 180, in _do_init
    self._jsc = jsc or self._initialize_context(self._conf._jconf)
  File "/opt/spark/python/pyspark/context.py", line 288, in _initialize_context
    return self._jvm.JavaSparkContext(jconf)
  File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1525, in __call__
    answer, self._gateway_client, None, self._fqn)
  File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.IllegalStateException: Spark context stopped while waiting for backend
        at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:745)
        at org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:191)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:560)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:238)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:745)

火花版本:

(base) appuser@jupyter-21001-0:/opt/notebooks/Test$ pyspark --version
Welcome to
     ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.0
      /_/
 
Using Scala version 2.11.12, OpenJDK 64-Bit Server VM, 1.8.0_152-release
Branch
Compiled by user  on 2018-10-29T06:22:05Z
Revision
Url
Type --help for more information.

主机文件:

# Kubernetes-managed hosts file.
127.0.0.1       localhost
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
fe00::0 ip6-mcastprefix
fe00::1 ip6-allnodes
fe00::2 ip6-allrouters
172.18.168.47   jupyter-21001-0.jupyter-21001.75dev-a123456.svc.cluster.local   jupyter-21001-0

spark-defaults.conf

spark.master yarn
spark.driver.extraClassPath /opt/spark/lib/gcs-connector-latest-hadoop2.jar
spark.executor.extraClassPath /usr/lib/hadoop/lib/gcs-connector.jar:/usr/lib/hadoop/lib/bigquery-connector.jar

core-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://jupyter-21001-m</value>
  </property>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://jupyter-21001-m</value>
  </property>
  <property>
    <name>fs.gs.impl</name>
    <value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem</value>
    <description>The FileSystem for gs: (GCS) uris.</description>
  </property>
  <property>
    <name>fs.AbstractFileSystem.gs.impl</name>
    <value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS</value>
    <description>The AbstractFileSystem for gs: (GCS) uris. Only necessary for use with Hadoop 2.</description>
  </property>
  <property>
    <name>fs.gs.project.id</name>
    <value>NOT_RUNNING_INSIDE_GCE</value>
  </property>
  <property>
    <name>google.cloud.auth.service.account.enable</name>
    <value>true</value>
  </property>
  <property>
    <name>google.cloud.auth.service.account.json.keyfile</name>
    <value>/oauth/auth.json</value>
  </property>
  <property>
    <name>fs.gs.implicit.dir.repair.enable</name>
    <value>false</value>
    <description>
      Whether or not to create objects for the parent directories of objects
      with / in their path e.g. creating gs://bucket/foo/ upon finding
      gs://bucket/foo/bar.
    </description>
  </property>
</configuration>

1 个答案:

答案 0 :(得分:1)

这听起来很难。在较高的级别上,我建议您先看看如何运行Jupyter notebooks inside of Dataproc,但我将尝试回答这个问题。

在YARN客户端模式下(pyspark shell使用)。 AppMaster需要拨回驱动程序,因此您需要将spark.driver.hostspark.driver.port设置为AppMaster可以与之交谈的内容。假设它与您的Dataproc群集位于同一GCE网络中的GKE上,NodePort可能是将驱动程序暴露于GCE网络的最佳选择。

我不知道的部分是将节点IP和端口传递到pyspark pod,以便您可以正确设置spark.driver.hostspark.driver.port(GKE专家请注意这一点)。

我不了解您的设置详细信息,但是我自己不会编辑GKE VM /etc/hosts。我也感到很奇怪,它介于2.2和2.4之间(是否有可能实际上没有在2.2设置的Dataproc上的YARN上运行?)。

最后您没有提到它,但是您需要在core-site.xml或yarn.site.xml中设置yarn.resourcemanager.hostname=jupyter-21001-m

作为部分调用。这应该可以在Dataproc中成功创建YARN AppMaster(然后将无法回拨到GKE):

kubectl run spark -i -t --image gcr.io/spark-operator/spark:v2.4.0 \
  --generator=run-pod/v1 --rm --restart=Never --env HADOOP_CONF_DIR=/tmp \
  --command -- /opt/spark/bin/spark-submit \
    --master yarn \
    --conf spark.hadoop.fs.defaultFS=hdfs://jupyter-21001-m \
    --conf spark.hadoop.yarn.resourcemanager.hostname=jupyter-21001-m \
    --class org.apache.spark.examples.SparkPi \ 
    /opt/spark/examples/jars/spark-examples_2.11-2.4.0.jar

抱歉,答案不完整,如果我填写缺失的部分,我会进行更新。