我试图编写一个执行简单的数字计数工作的代码。它适用于本地主机。它在pyspark的interacitve模式下可用于纱线主控。但是,当我尝试火花提交此python文件时。没用 注意:我未写入代码中的任何文件。
OS:CentOS 7 火花:2.3.1 python:3.7与miniconda
from pyspark import SparkContext
import random
from operator import add
sc = SparkContext('yarn', 'app')
candidates = [(random.randint(0, 5), 1) for cnt in range(1000)]
nums = sc.parallelize(candidates, 3)
result = nums.reduceByKey(add).collectAsMap()
print(result)
sc.stop()
WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, node3, executor 1): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/opt/miniconda3/lib/python3.7/site-packages/pyspark/worker.py", line 272, in main
secret = UTF8Deserializer().loads(infile)
File "/opt/miniconda3/lib/python3.7/site-packages/pyspark/serializers.py", line 684, in loads
return s.decode("utf-8") if self.use_unicode else s
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 32: invalid start byte
19/05/30 16:04:47 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, node1, executor 2): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/opt/miniconda3/lib/python3.7/site-packages/pyspark/worker.py", line 326, in main
filename = utf8_deserializer.loads(infile)
File "/opt/miniconda3/lib/python3.7/site-packages/pyspark/serializers.py", line 683, in loads
s = stream.read(length)
ValueError: read length must be non-negative or -1
19/05/30 16:04:48 ERROR TaskSetManager: Task 1 in stage 0.0 failed 4 times; aborting job
19/05/30 16:04:48 WARN TaskSetManager: Lost task 0.2 in stage 0.0 (TID 8, node3, executor 1): TaskKilled (Stage cancelled)
Traceback (most recent call last):
File "/mapr/MyCluster/MapRPOC/sparktest.py", line 8, in <module>
result = nums.reduceByKey(add).collectAsMap()
File "/opt/mapr/spark/spark-2.3.1/python/lib/pyspark.zip/pyspark/rdd.py", line 1602, in collectAsMap
File "/opt/mapr/spark/spark-2.3.1/python/lib/pyspark.zip/pyspark/rdd.py", line 834, in collect
File "/opt/mapr/spark/spark-2.3.1/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/opt/mapr/spark/spark-2.3.1/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 7, node1, executor 2): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/opt/miniconda3/lib/python3.7/site-packages/pyspark/worker.py", line 272, in main
secret = UTF8Deserializer().loads(infile)
File "/opt/miniconda3/lib/python3.7/site-packages/pyspark/serializers.py", line 684, in loads
return s.decode("utf-8") if self.use_unicode else s
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 32: invalid start byte