我正在努力设置jupyter企业网关以触发。现在,我可以在客户端模式下将jupyter笔记本连接到内核,但是在尝试提交作业时,我遇到了两个PYTHON_WORKER_FACTORY_SECRET
和PYSPARK_GATEWAY_SECRET
客户端模式和群集模式的错误。
列出群集模式PYSPARK_GATEWAY_SECRET
无
File "/opt/anaconda3/lib/python3.6/threading.py", line 916, in_bootstrap_inner
self.run()
File "/opt/anaconda3/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "launch_ipykernel.py", line 62, in initialize_spark_session
spark = SparkSession.builder.getOrCreate()
File "/opt/anaconda3/lib/python3.6/site-packages/pyspark/sql/session.py", line 173, in getOrCreate
sc = SparkContext.getOrCreate(sparkConf)
File "/opt/anaconda3/lib/python3.6/site-packages/pyspark/context.py", line 343, in getOrCreate
SparkContext(conf=conf or SparkConf())
File "/opt/anaconda3/lib/python3.6/site-packages/pyspark/context.py", line 115, in __init__
SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
File "/opt/anaconda3/lib/python3.6/site-packages/pyspark/context.py", line 292, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
File "/opt/anaconda3/lib/python3.6/site-packages/pyspark/java_gateway.py", line 47, in launch_gateway
gateway_secret = os.environ["PYSPARK_GATEWAY_SECRET"]
File "/opt/anaconda3/lib/python3.6/os.py", line 669, in __getitem__
raise KeyError(key) from None
KeyError: 'PYSPARK_GATEWAY_SECRET'
这似乎还会在第二次运行中导致以下错误:
Container: container_e03_1536582358787_0027_02_000001 on spark-worker-1.c.mozn-location.internal_45454
LogAggregationType: AGGREGATED
======================================================================================================
LogType:stdout
LogLastModifiedTime:Tue Sep 11 07:35:25 +0000 2018
LogLength:520
LogContents:
Using connection file '/tmp/kernel-a3c49386-71a3-44b8-94de-1b870914c5fb_jvq2h0jy.json' instead of '/home/elyra/.local/share/jupyter/runtime/kernel-a3c49386-71a3-44b8-94de-1b870914c5fb.json'
Signal socket bound to host: 0.0.0.0, port: 46611
Traceback (most recent call last):
File "launch_ipykernel.py", line 319, in <module>
lower_port, upper_port)
File "launch_ipykernel.py", line 142, in return_connection_info
s.connect((response_ip, response_port))
ConnectionRefusedError: [Errno 111] Connection refused
End of LogType:stdout
***********************************************************************
客户端模式PYTHON_WORKER_FACTORY_SECRET
为无:
Traceback (most recent call last):
File "/opt/anaconda3/lib/python3.6/site-packages/pyspark/daemon.py", line 170, in manager
code = worker(sock, authenticated)
File "/opt/anaconda3/lib/python3.6/site-packages/pyspark/daemon.py", line 62, in worker
if os.environ["PYTHON_WORKER_FACTORY_SECRET"] == client_secret:
File "/opt/anaconda3/lib/python3.6/os.py", line 669, in __getitem__
raise KeyError(key) from None
KeyError: 'PYTHON_WORKER_FACTORY_SECRET'
18/09/11 07:26:27 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.net.SocketException: Connection reset
我正在尝试通过导出或直接在内核端设置此变量:
[elyra@spark-master ~]$ cat /usr/local/share/jupyter/kernels/spark_python_yarn_client/kernel.json
{
"language": "python",
"display_name": "Spark - Python (YARN Client Mode)",
"process_proxy": {
"class_name": "enterprise_gateway.services.processproxies.distributed.DistributedProcessProxy"
},
"env": {
"SPARK_HOME": "/usr/hdp/current/spark2-client",
"PYSPARK_PYTHON": "/opt/anaconda3/bin/python",
"PYTHONPATH": "/opt/anaconda3/lib/python3.6/site-packages/:/usr/hdp/current/spark2-client/python:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.6-src.zip",
"PYTHON_WORKER_FACTORY_SECRET": "w<X?u6I&Ekt>49n}K5kBJ^QM@Zz)Mf",
"SPARK_OPTS": "--master yarn --deploy-mode client --name ${KERNEL_ID:-ERROR__NO__KERNEL_ID}",
"LAUNCH_OPTS": ""
},
"argv": [
"/usr/local/share/jupyter/kernels/spark_python_yarn_client/bin/run.sh",
"{connection_file}",
"--RemoteProcessProxy.response-address",
"{response_address}",
"--RemoteProcessProxy.port-range",
"{port_range}",
"--RemoteProcessProxy.spark-context-initialization-mode",
"lazy"
]
}
对于客户端模式,您是否认为此链接与以下get os environment variable for PYTHON_WORKER_FACTORY_SECRET and java ports
关于集群模式,我的理解是spark [PythonRunner][1]
将自动初始化将由java_ gateway使用的。
基于企业网关团队的支持,我通过spark.yarn.appMasterEnv
设置环境变量,如下所示:
SPARK_OPTS": "--master yarn --deploy-mode cluster --name ${KERNEL_ID:-ERROR__NO__KERNEL_ID} --conf spark.yarn.submit.waitAppCompletion=false --conf spark.yarn.appMasterEnv.PYSPARK_GATEWAY_SECRET=this_secret_key --conf spark.yarn.appMasterEnv.PYTHONUSERBASE=/opt/anaconda3 --conf spark.yarn.appMasterEnv.PYTHONPATH=/opt/anaconda3/lib/python3.6/site-packages/:/usr/hdp/current/spark2-client/python:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.6-src.zip --conf spark.yarn.appMasterEnv.PATH=/opt/anaconda3/bin/python:$PATH"
但这会导致超时yarn logs -applicationId application_1536672003321_0007
18/09/12 11:56:14 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, elyra); groups with view permissions: Set(); users with modify permissions: Set(yarn, elyra); groups with modify permissions: Set()
18/09/12 11:56:14 INFO ApplicationMaster: Preparing Local resources
18/09/12 11:56:15 INFO ApplicationMaster: ApplicationAttemptId: appattempt_1536672003321_0007_000001
18/09/12 11:56:15 INFO ApplicationMaster: Starting the user application in a separate Thread
18/09/12 11:56:15 INFO ApplicationMaster: Waiting for spark context initialization...
18/09/12 11:57:55 ERROR ApplicationMaster: Uncaught exception:
java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:498)
at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:345)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply$mcV$sp(ApplicationMaster.scala:260)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$5.run(ApplicationMaster.scala:815)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:814)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:259)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:839)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
18/09/12 11:57:55 INFO ApplicationMaster: Final app status: FAILED, exitCode: 13, (reason: Uncaught exception: java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds])
18/09/12 11:57:55 INFO ShutdownHookManager: Shutdown hook called
我想问问是否有最佳的方法来设置火花环境变量,在这种情况下我缺少什么,因为我的理解是不需要设置底部PYSPARK_GATEWAY_SECRET
或PYTHON_WORKER_FACTORY_SECRET