PySpark:调用'count'函数时出错

时间:2017-02-06 14:06:02

标签: pyspark

我试图用pyspark运行一个简单的例子。这是代码:

from pyspark import SparkContext
sc = SparkContext("local", "Simple App")
data = sc.textFile("/opt/HistorCommande.csv")
         .map(lambda line: line.split(","))
         .map(lambda record:(record[0], record[1], record[2]))
NbCommande = data.count()
print("Nb de commandes: %d" % NbCommande)

但是,当我运行这段代码(cmd:./ bin / spark-submit /opt/test.py)时,我收到以下错误:

Traceback (most recent call last):
  File "/opt/test.py", line 6, in <module>
     NbCommande = data.count()
  File "/opt/spark-2.1.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/rdd.py", line 1041, in count
  File "/opt/spark-2.1.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/rdd.py", line 1032, in sum
  File "/opt/spark-2.1.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/rdd.py", line 906, in fold
  File "/opt/spark-2.1.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/rdd.py", line 809, in collect
  File "/opt/spark-2.1.0-bin-hadoop2.6/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
  File "/opt/spark-2.1.0-bin-hadoop2.6/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
  py4j.protocol.Py4JJavaError

我不知道它意味着什么,因此,我无法将它编织出来。我尝试了wordCound示例,我得到了sam错误。

如果有人知道如何解决这个问题,我会非常感激。

0 个答案:

没有答案