在纱线上引发火花UnicodeDecodeError

时间:2019-05-30 16:21:12

标签: apache-spark pyspark yarn

我试图编写一个执行简单的数字计数工作的代码。它适用于本地主机。它在pyspark的interacitve模式下可用于纱线主控。但是,当我尝试火花提交此python文件时。没用 注意:我未写入代码中的任何文件。

OS:CentOS 7 火花:2.3.1 python:3.7与miniconda

from pyspark import SparkContext
import random
from operator import add

sc = SparkContext('yarn', 'app')
candidates = [(random.randint(0, 5), 1) for cnt  in range(1000)]
nums = sc.parallelize(candidates, 3)
result = nums.reduceByKey(add).collectAsMap()
print(result)
sc.stop()            

WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, node3, executor 1): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/opt/miniconda3/lib/python3.7/site-packages/pyspark/worker.py", line 272, in main
    secret = UTF8Deserializer().loads(infile)
  File "/opt/miniconda3/lib/python3.7/site-packages/pyspark/serializers.py", line 684, in loads
    return s.decode("utf-8") if self.use_unicode else s
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 32: invalid start byte
19/05/30 16:04:47 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, node1, executor 2): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/opt/miniconda3/lib/python3.7/site-packages/pyspark/worker.py", line 326, in main
    filename = utf8_deserializer.loads(infile)
  File "/opt/miniconda3/lib/python3.7/site-packages/pyspark/serializers.py", line 683, in loads
    s = stream.read(length)
ValueError: read length must be non-negative or -1
19/05/30 16:04:48 ERROR TaskSetManager: Task 1 in stage 0.0 failed 4 times; aborting job
19/05/30 16:04:48 WARN TaskSetManager: Lost task 0.2 in stage 0.0 (TID 8, node3, executor 1): TaskKilled (Stage cancelled)
Traceback (most recent call last):
  File "/mapr/MyCluster/MapRPOC/sparktest.py", line 8, in <module>
    result = nums.reduceByKey(add).collectAsMap()
  File "/opt/mapr/spark/spark-2.3.1/python/lib/pyspark.zip/pyspark/rdd.py", line 1602, in collectAsMap
  File "/opt/mapr/spark/spark-2.3.1/python/lib/pyspark.zip/pyspark/rdd.py", line 834, in collect
  File "/opt/mapr/spark/spark-2.3.1/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/opt/mapr/spark/spark-2.3.1/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 7, node1, executor 2): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/opt/miniconda3/lib/python3.7/site-packages/pyspark/worker.py", line 272, in main
    secret = UTF8Deserializer().loads(infile)
  File "/opt/miniconda3/lib/python3.7/site-packages/pyspark/serializers.py", line 684, in loads
    return s.decode("utf-8") if self.use_unicode else s
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 32: invalid start byte

0 个答案:

没有答案