Question

我有一个要求，我需要在Spark驱动程序上收集一些列，而某些列包含非ascii字符。但是收集它们时会产生错误：

UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 187: ordinal not in range(128).

任何想法我怎么可以在获取它时将udf应用于列内容然后将其收集到驱动程序中？

我正在使用PySpark。

Answer 1

我有同样的问题。这对我有用：

import sys
import codecs
sys.stdout = codecs.getwriter('utf8')(sys.stdout)
sys.stderr = codecs.getwriter('utf8')(sys.stderr)