pyspark追溯:' utf8'编解码器不能解码位置22中的字节0xce:无效的连续字节

时间:2017-01-26 09:51:59

标签: python utf-8 pyspark

我的代码在python中运行spark,我只是按照其他人提供的代码,但追溯:' utf8'编解码器不能解码位置22中的字节0xce:无效的连续字节

# -*- coding: utf-8 -*-
from pyspark import SparkContext, SparkConf
import os
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
os.environ['SPARK_HOME'] = r'D:\spark-2.1.0-bin-hadoop2.7'
appName ="jhl_spark_1"
master= "local"
conf = SparkConf().setAppName(appName).setMaster(master)
sc = SparkContext(conf=conf)

rdd = sc.parallelize([2, 3, 4])
print sorted(rdd.flatMap(lambda x: range(1, x)).collect())

和追溯:

 File "F:/eclipse/�ı��ھ�/������APRIORI/spark.py", line 14, in <module>
 print sorted(rdd.flatMap(lambda x: range(1, x)).collect())
 File "D:\Anaconda2\lib\site-packages\pyspark\rdd.py", line 808, in collect with SCCallSiteSync(self.context) as css:
 File "D:\Anaconda2\lib\site-packages\pyspark\traceback_utils.py", line 72, in __enter__self._context._jsc.setCallSite(self._call_site)
 File "D:\Anaconda2\lib\site-packages\py4j\java_gateway.py", line 1124, in __call__args_command, temp_args = self._build_args(*args)
 File "D:\Anaconda2\lib\site-packages\py4j\java_gateway.py", line 1094, in _build_args
[get_command_part(arg, self.pool) for arg in new_args])
 File "D:\Anaconda2\lib\site-packages\py4j\protocol.py", line 283, in get_command_part
command_part = STRING_TYPE + escape_new_line(parameter)
 File "D:\Anaconda2\lib\site-packages\py4j\protocol.py", line 183, in escape_new_line
return smart_decode(original).replace("\\", "\\\\").replace("\r", "\\r").\
 File "D:\Anaconda2\lib\site-packages\py4j\protocol.py", line 210, in smart_decode
return unicode(s, "utf-8")
UnicodeDecodeError: 'utf8' codec can't decode byte 0xce in position 22: invalid continuation byte

0 个答案:

没有答案