我试图用pyspark运行一个简单的例子。这是代码:
from pyspark import SparkContext
sc = SparkContext("local", "Simple App")
data = sc.textFile("/opt/HistorCommande.csv")
.map(lambda line: line.split(","))
.map(lambda record:(record[0], record[1], record[2]))
NbCommande = data.count()
print("Nb de commandes: %d" % NbCommande)
但是,当我运行这段代码(cmd:./ bin / spark-submit /opt/test.py)时,我收到以下错误:
Traceback (most recent call last):
File "/opt/test.py", line 6, in <module>
NbCommande = data.count()
File "/opt/spark-2.1.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/rdd.py", line 1041, in count
File "/opt/spark-2.1.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/rdd.py", line 1032, in sum
File "/opt/spark-2.1.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/rdd.py", line 906, in fold
File "/opt/spark-2.1.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/rdd.py", line 809, in collect
File "/opt/spark-2.1.0-bin-hadoop2.6/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
File "/opt/spark-2.1.0-bin-hadoop2.6/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError
我不知道它意味着什么,因此,我无法将它编织出来。我尝试了wordCound示例,我得到了sam错误。
如果有人知道如何解决这个问题,我会非常感激。