我有Pycharm 2018.2并安装了Apache Spark 2.3.3。我已经安装了Pyspark软件包2.4.3。执行程序时,出现标题错误
from pyspark import SparkConf, SparkContext
import collections
conf = SparkConf().setMaster("local").setAppName("RatingsHistogram")
sc = SparkContext(conf=conf)
lines = sc.textFile("file:///SparkCourse/ml-100k/u.data")
ratings = lines.map(lambda x: x.split()[2]) #--> the error comes when executing this line
result = ratings.countByValue()
sortedResults = collections.OrderedDict(sorted(result.items()))
for key, value in sortedResults.items():
print("%s %i" % (key, value))
the actual output is count of movies as per rating
1 6110
2 11370
3 27145
i got this result when i ran the program in enthought canopy command prompt but i am getting error in pycharm
答案 0 :(得分:0)
我遇到了同样的问题-我正在尝试与您同样的课程。当我将PyCharm安装的pyspark版本更改为2.3.x时,该错误已解决。