错误

Question

将我的csv转换为pyspark中的数据框时出错。

read_rdd = sc.textFile("path to my container/myfile.csv")  
intermediate_rdd = read_rdd.mapPartitions(lambda x: csv.reader(x, delimiter=","))  
header=intermediate_rdd.first()  
data_1 = intermediate_rdd.filter(lambda row : row != header).toDF(header)  
data_1.show(5)

错误

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2026' in position 115: ordinal not in range(128)

Answer 1

import csv
from pyspark.sql.types import Row
read_rdd = sc.textFile("path/to/file")
intermediate_rdd = read_rdd.mapPartitions(lambda x: csv.reader(x, delimiter=","))
data = intermediate_rdd.filter(lambda row : row != header).toDF(header)
data.show(20)

＆＃39; ASCII＆＃39;编解码器不能编码字符u＆＃39; \ u2026＆＃39;位置115：在将Rdd转换为Dataframe时，序数不在范围内（128）：Pyspark：Azure

错误

1 个答案: