从pyspark访问文件时无法连接到hadoop集群

时间:2016-03-25 17:08:57

标签: hadoop pyspark

我正在运行以下代码:

conf = SparkConf().setAppName("basicRegressionUbuntu").setMaster("spark://MyCUSTOMIP:7077")
sc = SparkContext(conf=conf)

rdd = sc.textFile("hdfs://MYHADOOPMASTERNODE:8020/sampleData/Sacramentorealestatetransactions.csv")

它引发了以下内容:

16/03/25 10:01:11 WARN security.UserGroupInformation: PriviledgedActionException as:hduser (auth:SIMPLE) cause:java.io.IOException: Failed to connect to /10.0.2.15:42939
Exception in thread "main" java.io.IOException: Failed to connect to /10.0.2.15:42939
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
    at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:200)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:183)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection timed out: /10.0.2.15:42939
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
    at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
    at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
    ... 1 more

我知道文件路径存在,因为当我通过SSH连接到MYHADOOPMASTERNODE并执行hdfs dfs -ls /sampleData/时,它会显示我的文件。

非常感谢任何帮助!

0 个答案:

没有答案