在产生火花问题PYTHON_WORKER_FACTORY_SECRET上初始化jupyter内核吗?

时间:2018-09-12 12:22:11

标签: apache-spark pyspark jupyter

我正在努力设置jupyter企业网关以触发。现在,我可以在客户端模式下将jupyter笔记本连接到内核,但是在尝试提交作业时,我遇到了两个PYTHON_WORKER_FACTORY_SECRETPYSPARK_GATEWAY_SECRET客户端模式和群集模式的错误。

列出群集模式PYSPARK_GATEWAY_SECRET

File "/opt/anaconda3/lib/python3.6/threading.py", line 916, in_bootstrap_inner
    self.run()
  File "/opt/anaconda3/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "launch_ipykernel.py", line 62, in initialize_spark_session
    spark = SparkSession.builder.getOrCreate()
  File "/opt/anaconda3/lib/python3.6/site-packages/pyspark/sql/session.py", line 173, in getOrCreate
    sc = SparkContext.getOrCreate(sparkConf)
  File "/opt/anaconda3/lib/python3.6/site-packages/pyspark/context.py", line 343, in getOrCreate
    SparkContext(conf=conf or SparkConf())
  File "/opt/anaconda3/lib/python3.6/site-packages/pyspark/context.py", line 115, in __init__
    SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
  File "/opt/anaconda3/lib/python3.6/site-packages/pyspark/context.py", line 292, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway(conf)
  File "/opt/anaconda3/lib/python3.6/site-packages/pyspark/java_gateway.py", line 47, in launch_gateway
    gateway_secret = os.environ["PYSPARK_GATEWAY_SECRET"]
  File "/opt/anaconda3/lib/python3.6/os.py", line 669, in __getitem__
    raise KeyError(key) from None
KeyError: 'PYSPARK_GATEWAY_SECRET'

这似乎还会在第二次运行中导致以下错误:

Container: container_e03_1536582358787_0027_02_000001 on spark-worker-1.c.mozn-location.internal_45454
LogAggregationType: AGGREGATED
======================================================================================================
LogType:stdout
LogLastModifiedTime:Tue Sep 11 07:35:25 +0000 2018
LogLength:520
LogContents:
Using connection file '/tmp/kernel-a3c49386-71a3-44b8-94de-1b870914c5fb_jvq2h0jy.json' instead of '/home/elyra/.local/share/jupyter/runtime/kernel-a3c49386-71a3-44b8-94de-1b870914c5fb.json'
Signal socket bound to host: 0.0.0.0, port: 46611
Traceback (most recent call last):
  File "launch_ipykernel.py", line 319, in <module>
    lower_port, upper_port)
  File "launch_ipykernel.py", line 142, in return_connection_info
    s.connect((response_ip, response_port))
ConnectionRefusedError: [Errno 111] Connection refused

End of LogType:stdout
***********************************************************************

客户端模式PYTHON_WORKER_FACTORY_SECRET为无:

Traceback (most recent call last):
  File "/opt/anaconda3/lib/python3.6/site-packages/pyspark/daemon.py", line 170, in manager
    code = worker(sock, authenticated)
  File "/opt/anaconda3/lib/python3.6/site-packages/pyspark/daemon.py", line 62, in worker
    if os.environ["PYTHON_WORKER_FACTORY_SECRET"] == client_secret:
  File "/opt/anaconda3/lib/python3.6/os.py", line 669, in __getitem__
    raise KeyError(key) from None
KeyError: 'PYTHON_WORKER_FACTORY_SECRET'
18/09/11 07:26:27 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.net.SocketException: Connection reset

我正在尝试通过导出或直接在内核端设置此变量:

[elyra@spark-master ~]$ cat /usr/local/share/jupyter/kernels/spark_python_yarn_client/kernel.json 
{
  "language": "python",
  "display_name": "Spark - Python (YARN Client Mode)",
  "process_proxy": {
    "class_name": "enterprise_gateway.services.processproxies.distributed.DistributedProcessProxy"
  },
  "env": {
    "SPARK_HOME": "/usr/hdp/current/spark2-client",
    "PYSPARK_PYTHON": "/opt/anaconda3/bin/python",
    "PYTHONPATH": "/opt/anaconda3/lib/python3.6/site-packages/:/usr/hdp/current/spark2-client/python:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.6-src.zip",
    "PYTHON_WORKER_FACTORY_SECRET": "w<X?u6I&Ekt>49n}K5kBJ^QM@Zz)Mf",
    "SPARK_OPTS": "--master yarn --deploy-mode client --name ${KERNEL_ID:-ERROR__NO__KERNEL_ID}",
    "LAUNCH_OPTS": ""
  },
  "argv": [
    "/usr/local/share/jupyter/kernels/spark_python_yarn_client/bin/run.sh",
    "{connection_file}",
    "--RemoteProcessProxy.response-address",
    "{response_address}",
    "--RemoteProcessProxy.port-range",
    "{port_range}",
    "--RemoteProcessProxy.spark-context-initialization-mode",
    "lazy"
  ]
}

对于客户端模式,您是否认为此链接与以下get os environment variable for PYTHON_WORKER_FACTORY_SECRET and java ports

关于集群模式,我的理解是spark [PythonRunner][1]将自动初始化将由java_ gateway使用的。

基于企业网关团队的支持,我通过spark.yarn.appMasterEnv设置环境变量,如下所示:

SPARK_OPTS": "--master yarn --deploy-mode cluster --name ${KERNEL_ID:-ERROR__NO__KERNEL_ID} --conf spark.yarn.submit.waitAppCompletion=false --conf spark.yarn.appMasterEnv.PYSPARK_GATEWAY_SECRET=this_secret_key --conf spark.yarn.appMasterEnv.PYTHONUSERBASE=/opt/anaconda3 --conf spark.yarn.appMasterEnv.PYTHONPATH=/opt/anaconda3/lib/python3.6/site-packages/:/usr/hdp/current/spark2-client/python:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.6-src.zip --conf spark.yarn.appMasterEnv.PATH=/opt/anaconda3/bin/python:$PATH"

但这会导致超时yarn logs -applicationId application_1536672003321_0007

18/09/12 11:56:14 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(yarn, elyra); groups with view permissions: Set(); users  with modify permissions: Set(yarn, elyra); groups with modify permissions: Set()
18/09/12 11:56:14 INFO ApplicationMaster: Preparing Local resources
18/09/12 11:56:15 INFO ApplicationMaster: ApplicationAttemptId: appattempt_1536672003321_0007_000001
18/09/12 11:56:15 INFO ApplicationMaster: Starting the user application in a separate Thread
18/09/12 11:56:15 INFO ApplicationMaster: Waiting for spark context initialization...
18/09/12 11:57:55 ERROR ApplicationMaster: Uncaught exception: 
java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds]
    at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
    at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
    at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)
    at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:498)
    at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:345)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply$mcV$sp(ApplicationMaster.scala:260)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$5.run(ApplicationMaster.scala:815)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
    at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:814)
    at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:259)
    at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:839)
    at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
18/09/12 11:57:55 INFO ApplicationMaster: Final app status: FAILED, exitCode: 13, (reason: Uncaught exception: java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds])
18/09/12 11:57:55 INFO ShutdownHookManager: Shutdown hook called

我想问问是否有最佳的方法来设置火花环境变量,在这种情况下我缺少什么,因为我的理解是不需要设置底部PYSPARK_GATEWAY_SECRETPYTHON_WORKER_FACTORY_SECRET

0 个答案:

没有答案