我在pyspark上工作,它安装在ubuntu机器16.04中,我现在需要提取结果,这是一个很长的代码,结果是一个数据帧,我想要保存为csv文件,每次我收到以下错误,一切正常,但代码的最后一行:
final_df.write.format('txt').save('final_test1')
请你指点一下,我该怎么办?
ERROR:root:Exception while sending command.
Traceback (most recent call last):
File "/home/ubuntu/spark-2.1.1-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1035, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ubuntu/spark-2.1.1-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 883, in send_command
response = connection.send_command(command)
File "/home/ubuntu/spark-2.1.1-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1040, in send_command
"Error while receiving", e, proto.ERROR_ON_RECEIVE)
py4j.protocol.Py4JNetworkError: Error while receiving
Traceback (most recent call last):
File "/usr/lib/python3.5/socketserver.py", line 313, in _handle_request_noblock
self.process_request(request, client_address)
File "/usr/lib/python3.5/socketserver.py", line 341, in process_request
self.finish_request(request, client_address)
File "/usr/lib/python3.5/socketserver.py", line 354, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/usr/lib/python3.5/socketserver.py", line 681, in __init__
self.handle()
File "/home/ubuntu/spark-2.1.1-bin-hadoop2.7/python/pyspark/accumulators.py", line 235, in handle
num_updates = read_int(self.rfile)
File "/home/ubuntu/spark-2.1.1-bin-hadoop2.7/python/pyspark/serializers.py", line 577, in read_int
raise EOFError
EOFError
---------------------------------------------------------------------------
Py4JError Traceback (most recent call last)
<ipython-input-22-f56812202624> in <module>()
1 final_df.cache()
----> 2 final_df.write.format('csv').save('final_test1')
~/spark-2.1.1-bin-hadoop2.7/python/pyspark/sql/readwriter.py in save(self, path, format, mode, partitionBy, **options)
548 self._jwrite.save()
549 else:
--> 550 self._jwrite.save(path)
551
552 @since(1.4)
~/spark-2.1.1-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py in __call__(self, *args)
1131 answer = self.gateway_client.send_command(command)
1132 return_value = get_return_value(
-> 1133 answer, self.gateway_client, self.target_id, self.name)
1134
1135 for temp_arg in temp_args:
~/spark-2.1.1-bin-hadoop2.7/python/pyspark/sql/utils.py in deco(*a, **kw)
61 def deco(*a, **kw):
62 try:
---> 63 return f(*a, **kw)
64 except py4j.protocol.Py4JJavaError as e:
65 s = e.java_exception.toString()
~/spark-2.1.1-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
325 raise Py4JError(
326 "An error occurred while calling {0}{1}{2}".
--> 327 format(target_id, ".", name))
328 else:
329 type = answer[1]
Py4JError: An error occurred while calling o3911.save
答案 0 :(得分:0)
也许你应该试试这个
final_df.write.csv('final_test1.csv')
答案 1 :(得分:0)
该错误还可能是由于您复制了错误的jar
文件。
尝试从Maven复制程序集jar
文件:
https://search.maven.org/search?q=a:spark-streaming-kafka-0-8-assembly_2.11