我在我的虚拟机(Ubuntu 12.04)上运行了一个spark(1.2.1)独立群集。我可以成功运行als.py和pi.py等示例。但我无法运行workcount.py示例,因为会发生连接错误。
bin/spark-submit --master spark://192.168.1.211:7077 /examples/src/main/python/wordcount.py ~/Documents/Spark_Examples/wordcount.py
错误信息如下:
15/03/13 22:26:02 INFO BlockManagerMasterActor: Registering block manager a12:45594 with 267.3 MB RAM, BlockManagerId(0, a12, 45594)
15/03/13 22:26:03 INFO Client: Retrying connect to server: a11/192.168.1.211:9000. Already tried 4 time(s).
......
Traceback (most recent call last):
File "/home/spark/spark/examples/src/main/python/wordcount.py", line 32, in <module>
.reduceByKey(add)
File "/home/spark/spark/lib/spark-assembly-1.2.1 hadoop1.0.4.jar/pyspark/rdd.py", line 1349, in reduceByKey
File "/home/spark/spark/lib/spark-assembly-1.2.1-hadoop1.0.4.jar/pyspark/rdd.py", line 1559, in combineByKey
File "/home/spark/spark/lib/spark-assembly-1.2.1-hadoop1.0.4.jar/pyspark/rdd.py", line 1942, in _defaultReducePartitions
File "/home/spark/spark/lib/spark-assembly-1.2.1-hadoop1.0.4.jar/pyspark/rdd.py", line 297, in getNumPartitions
......
py4j.protocol.Py4JJavaError: An error occurred while calling o23.partitions.
java.lang.RuntimeException: java.net.ConnectException: Call to a11/192.168.1.211:9000 failed on connection exception: java.net.ConnectException: Connection refused
......
我没有使用Yarn或ZooKeeper。并且所有虚拟机都可以通过ssh无需密码即可相互连接。我还为主人和工人设置了SPARK_LOCAL_IP。
答案 0 :(得分:0)
我认为wordcount.py示例正在访问hdfs以读取文件中的行(然后计算单词) 类似的东西:
sc.textFile("hdfs://<master-hostname>:9000/path/to/whatever")
端口9000通常用于hdfs。 请确保该文件可访问或不使用hdfs作为该示例:)。 我希望它有所帮助。