Spark新手。 - 获取java.net.BindException:无法分配请求的地址

时间:2016-03-09 14:31:37

标签: apache-spark pyspark

我是Spark的新手。刚刚在我的笔记本电脑(Ubuntu)中安装了1.6.0版本,然后使用phyton的shell(pyspark)进行了示例。但是,我无法弄清楚这个错误告诉我的是什么。你能帮忙吗?任何帮助表示赞赏。

>>> lines = sc.textFile("spark-1.6.0/README.md")
>>> lines.count()
Traceback (most recent call last):                                              
  File "<stdin>", line 1, in <module>
  File "/home/ricky/spark-1.6.0/python/pyspark/rdd.py", line 1004, in count
    return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
  File "/home/ricky/spark-1.6.0/python/pyspark/rdd.py", line 995, in sum
    return self.mapPartitions(lambda x: [sum(x)]).fold(0, operator.add)
  File "/home/ricky/spark-1.6.0/python/pyspark/rdd.py", line 869, in fold
    vals = self.mapPartitions(func).collect()
  File "/home/ricky/spark-1.6.0/python/pyspark/rdd.py", line 771, in collect
    port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
  File "/home/ricky/spark-1.6.0/python/lib/py4j-0.9-    src.zip/py4j/java_gateway.py", line 813, in __call__
  File "/home/ricky/spark-1.6.0/python/pyspark/sql/utils.py", line 45, in   deco
    return f(*a, **kw)
  File "/home/ricky/spark-1.6.0/python/lib/py4j-0.9-   src.zip/py4j/protocol.py", line 308, **in get_return_value**
py4j.protocol.Py4JJavaError: An error occurred while calling     z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: java.net.BindException: Cannot assign requested address
        at java.net.PlainSocketImpl.socketBind(Native Method)
        at     java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
        at java.net.ServerSocket.bind(ServerSocket.java:375)
        at java.net.ServerSocket.<init>(ServerSocket.java:237)
        at     org.apache.spark.api.python.PythonRDD$.serveIterator(PythonRDD.scala:637)
        at    org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:405)
        at  org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at   sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)   
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java    :43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
        at py4j.Gateway.invoke(Gateway.java:259)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:209)
        at java.lang.Thread.run(Thread.java:745)
  
    
      

    
  

1 个答案:

答案 0 :(得分:0)

瑞奇,

看起来你想弄清楚如何调试问题

1)首先转到conf目录并将log4j.properties.template文件复制到log4j.properties并在该文件中,将日志级别更改为DEBUG log4j.rootCategory = DEBUG ,控制台并重新启动pyspark shell,你应该开始看到更多的调试信息

2)在你的代码更改中 lines = sc.textFile(“README.md”) lines.count()

当我在本地执行这些步骤时,这是我看到的日志     16/03/09 08:16:29 DEBUG BlockManager:获取本地块broadcast_0     16/03/09 08:16:29 DEBUG BlockManager:块broadcast_0的级别是StorageLevel(true,true,false,true,1)     16/03/09 08:16:29 DEBUG BlockManager:从内存中获取block broadcast_0     16/03/09 08:16:29 DEBUG HadoopRDD:创建新的JobConf并将其缓存以供以后重复使用     16/03/09 08:16:29 DEBUG:地址:SUNILPATIL.local / 10.250.57.78 isLoopbackAddress:false,主机10.250.57.78 SUNILPATIL.local     16/03/09 08:16:29 DEBUG FileInputFormat:获取FileStatuses所需的时间:8     16/03/09 08:16:29 INFO FileInputFormat:要处理的总输入路径:1     16/03/09 08:16:29 DEBUG FileInputFormat:getSplits生成的分割总数:2,TimeTaken:15     16/03/09 08:16:29 DEBUG ClosureCleaner:+++清洁关闭(org.apache.spark.rdd.RDD $$ anonfun $ collect $ 1 $$ anonfun $ 12)+++     16/03/09 08:16:29 DEBUG ClosureCleaner:+声明字段:2

Spark使用Hadoop IO来读取文件,当FileInputFormat启动时,它会尝试连接回本地环回地址。这可能是你的问题。如果没有发布详细的堆栈跟踪

Sunil