我在Windows上的Jupyter笔记本中运行PySpark。无论何时我执行Spark操作(例如rdd.take(n)
或rdd.count()
),我都会在命令提示符中收到Unicode警告。
UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
例如,以下代码生成正确的输出,将引发警告:
%%time
distFile = sc.textFile(pth, use_unicode=False)
distFile = distFile.map(lambda x: x.split("\t"))
distFile.take(10)
此处pth
是制表符分隔文本文件的路径。
如果我运行更大的任务,比如说rdd.count()
或rdd.collect()
,则每个任务完成后都会显示Unicode警告!然后控制台输出具有以下形式:
[Stage 30:================> (1483 + 32) / 2272]
C:\spark-2.0.2-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\worker.py:48: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
C:\spark-2.0.2-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\worker.py:48: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
C:\spark-2.0.2-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\worker.py:48: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
[Stage 30:================> (1484 + 32) / 2272]
C:\spark-2.0.2-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\worker.py:48: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
C:\spark-2.0.2-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\worker.py:48: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
C:\spark-2.0.2-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\worker.py:48: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
......等等。
我怀疑这是Jupyter设置的一个问题,因为如果我直接在spark解释器中运行上面的代码,就不会收到Unicode警告。有谁知道解决这个问题的方法?
操作系统:Windows Server 2012
Python版本2.7.12
Anaconda版本4.2.0(64位)
Jupyter笔记本4.2.3
Spark版本2.0.2