我是Spark的新手。刚刚在我的笔记本电脑(Ubuntu)中安装了1.6.0版本,然后使用phyton的shell(pyspark)进行了示例。但是,我无法弄清楚这个错误告诉我的是什么。你能帮忙吗?任何帮助表示赞赏。
>>> lines = sc.textFile("spark-1.6.0/README.md")
>>> lines.count()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ricky/spark-1.6.0/python/pyspark/rdd.py", line 1004, in count
return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
File "/home/ricky/spark-1.6.0/python/pyspark/rdd.py", line 995, in sum
return self.mapPartitions(lambda x: [sum(x)]).fold(0, operator.add)
File "/home/ricky/spark-1.6.0/python/pyspark/rdd.py", line 869, in fold
vals = self.mapPartitions(func).collect()
File "/home/ricky/spark-1.6.0/python/pyspark/rdd.py", line 771, in collect
port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
File "/home/ricky/spark-1.6.0/python/lib/py4j-0.9- src.zip/py4j/java_gateway.py", line 813, in __call__
File "/home/ricky/spark-1.6.0/python/pyspark/sql/utils.py", line 45, in deco
return f(*a, **kw)
File "/home/ricky/spark-1.6.0/python/lib/py4j-0.9- src.zip/py4j/protocol.py", line 308, **in get_return_value**
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: java.net.BindException: Cannot assign requested address
at java.net.PlainSocketImpl.socketBind(Native Method)
at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
at java.net.ServerSocket.bind(ServerSocket.java:375)
at java.net.ServerSocket.<init>(ServerSocket.java:237)
at org.apache.spark.api.python.PythonRDD$.serveIterator(PythonRDD.scala:637)
at org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:405)
at org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java :43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)
答案 0 :(得分:0)
瑞奇,
看起来你想弄清楚如何调试问题
1)首先转到conf目录并将log4j.properties.template文件复制到log4j.properties并在该文件中,将日志级别更改为DEBUG log4j.rootCategory = DEBUG ,控制台并重新启动pyspark shell,你应该开始看到更多的调试信息
2)在你的代码更改中 lines = sc.textFile(“README.md”) lines.count()
当我在本地执行这些步骤时,这是我看到的日志 16/03/09 08:16:29 DEBUG BlockManager:获取本地块broadcast_0 16/03/09 08:16:29 DEBUG BlockManager:块broadcast_0的级别是StorageLevel(true,true,false,true,1) 16/03/09 08:16:29 DEBUG BlockManager:从内存中获取block broadcast_0 16/03/09 08:16:29 DEBUG HadoopRDD:创建新的JobConf并将其缓存以供以后重复使用 16/03/09 08:16:29 DEBUG:地址:SUNILPATIL.local / 10.250.57.78 isLoopbackAddress:false,主机10.250.57.78 SUNILPATIL.local 16/03/09 08:16:29 DEBUG FileInputFormat:获取FileStatuses所需的时间:8 16/03/09 08:16:29 INFO FileInputFormat:要处理的总输入路径:1 16/03/09 08:16:29 DEBUG FileInputFormat:getSplits生成的分割总数:2,TimeTaken:15 16/03/09 08:16:29 DEBUG ClosureCleaner:+++清洁关闭(org.apache.spark.rdd.RDD $$ anonfun $ collect $ 1 $$ anonfun $ 12)+++ 16/03/09 08:16:29 DEBUG ClosureCleaner:+声明字段:2
Spark使用Hadoop IO来读取文件,当FileInputFormat启动时,它会尝试连接回本地环回地址。这可能是你的问题。如果没有发布详细的堆栈跟踪
Sunil