退出python

时间:2018-11-23 03:26:01

标签: python apache-spark pyspark apache-spark-standalone

背景:

  • k8s上的火花独立群集模式
  • 火花2.2.1
  • hadoop 2.7.6
  • 在python中运行代码,而不是在pyspark中运行
  • 客户端模式,而不是集群模式

python中的pyspark代码,而不是pyspark env中的代码。 每个代码都可以正常工作,并且可以解决它。但是“有时”,当代码完成并退出时,spark.stop()之后的time.sleep(10)甚至会显示以下错误。


{{py4j.java_gateway:1038}} INFO - Error while receiving.
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/py4j-0.10.4-py2.7.egg/py4j/java_gateway.py", line 1035, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
Py4JNetworkError: Answer from Java side is empty
[2018-11-22 09:06:40,293] {{root:899}} ERROR - Exception while sending command.
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/py4j-0.10.4-py2.7.egg/py4j/java_gateway.py", line 883, in send_command
    response = connection.send_command(command)
  File "/usr/lib/python2.7/site-packages/py4j-0.10.4-py2.7.egg/py4j/java_gateway.py", line 1040, in send_command
    "Error while receiving", e, proto.ERROR_ON_RECEIVE)
Py4JNetworkError: Error while receiving
[2018-11-22 09:06:40,293] {{py4j.java_gateway:443}} DEBUG - Exception while shutting down a socket
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/py4j-0.10.4-py2.7.egg/py4j/java_gateway.py", line 441, in quiet_shutdown
    socket_instance.shutdown(socket.SHUT_RDWR)
  File "/usr/lib64/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
  File "/usr/lib64/python2.7/socket.py", line 170, in _dummy
    raise error(EBADF, 'Bad file descriptor')
error: [Errno 9] Bad file descriptor

我猜想原因是父进程python试图从终止的子进程'jvm'获取日志消息。但接线的问题是错误并不总是会出现...

有什么建议吗?

1 个答案:

答案 0 :(得分:0)

此根本原因是“ py4j”日志级别。

我将python日志级别设置为DEBUG,这使'py4j'客户端和'java'在关闭pyspark时引发连接错误。

因此将python日志级别设置为INFO或更高级别将解决此问题。

ref:Gateway raises an exception when shut down

ref:Tune down the logging level for callback server messages

ref:PySpark Internals