背景:
python中的pyspark代码,而不是pyspark env中的代码。 每个代码都可以正常工作,并且可以解决它。但是“有时”,当代码完成并退出时,spark.stop()之后的time.sleep(10)甚至会显示以下错误。
{{py4j.java_gateway:1038}} INFO - Error while receiving.
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/py4j-0.10.4-py2.7.egg/py4j/java_gateway.py", line 1035, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
Py4JNetworkError: Answer from Java side is empty
[2018-11-22 09:06:40,293] {{root:899}} ERROR - Exception while sending command.
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/py4j-0.10.4-py2.7.egg/py4j/java_gateway.py", line 883, in send_command
response = connection.send_command(command)
File "/usr/lib/python2.7/site-packages/py4j-0.10.4-py2.7.egg/py4j/java_gateway.py", line 1040, in send_command
"Error while receiving", e, proto.ERROR_ON_RECEIVE)
Py4JNetworkError: Error while receiving
[2018-11-22 09:06:40,293] {{py4j.java_gateway:443}} DEBUG - Exception while shutting down a socket
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/py4j-0.10.4-py2.7.egg/py4j/java_gateway.py", line 441, in quiet_shutdown
socket_instance.shutdown(socket.SHUT_RDWR)
File "/usr/lib64/python2.7/socket.py", line 224, in meth
return getattr(self._sock,name)(*args)
File "/usr/lib64/python2.7/socket.py", line 170, in _dummy
raise error(EBADF, 'Bad file descriptor')
error: [Errno 9] Bad file descriptor
我猜想原因是父进程python试图从终止的子进程'jvm'获取日志消息。但接线的问题是错误并不总是会出现...
有什么建议吗?
答案 0 :(得分:0)
此根本原因是“ py4j”日志级别。
我将python日志级别设置为DEBUG,这使'py4j'客户端和'java'在关闭pyspark时引发连接错误。
因此将python日志级别设置为INFO或更高级别将解决此问题。
ref:Gateway raises an exception when shut down
ref:Tune down the logging level for callback server messages