构造远程块的Spark I / O错误

时间:2017-07-25 17:48:04

标签: hadoop apache-spark cluster-computing

我想在同一个网络中创建一台带有两台计算机的自制火花星团。设置如下:

A)安装了hadoop hdfs的192.168.1.9 spark master

Hadoop有这个core-site.xml

<configuration>
<property>
    <name>hadoop.tmp.dir</name>
    <value>/app/hadoop/tmp</value>
</property>
<property>
    <name>fs.defaultFS</name>
    <value>hdfs://0.0.0.0:9000</value>
</property>
</configuration>

B)192.168.1.6仅带火花(奴隶)

从B我想使用spark命令访问A的hadoop hdfs中的文件:

...
# Load files
file_1 = "input_1.pkl"
file_2 = "input_2.pkl"
hdfs_base_path = "hdfs://192.168.1.9:9000/folderx/" 
sc.addFile(hdfs_base_path + file_1)
sc.addFile(hdfs_base_path + file_2)

# Get files back
with open(SparkFiles.get(file_1), 'rb') as fw:
    // use fw

但是,如果我想在B中测试程序,当我使用命令在B中执行程序时:

./spark-submit --master local program.py

输出如下:

17/07/25 19:02:51 INFO SparkContext: Added file hdfs://192.168.1.9:9000/bigdata/input_1_new_grid.pkl at hdfs://192.168.1.9:9000/bigdata/input_1_new_grid.pkl with timestamp 1501002171301
17/07/25 19:02:51 INFO Utils: Fetching hdfs://192.168.1.9:9000/bigdata/input_1_new_grid.pkl to /tmp/spark-838c3774-36ec-4db1-ab01-a8a8c627b100/userFiles-b4973f80-be6e-4f2e-8ba1-cd64ddca369a/fetchFileTemp1979399086141127743.tmp
17/07/25 19:02:51 WARN BlockReaderFactory: I/O error constructing remote block reader.
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)

后来:

    17/07/25 19:02:51 WARN DFSClient: Failed to connect to /127.0.0.1:50010 for block, add to deadNodes and continue. java.net.ConnectException: Connection refused
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

程序尝试访问127.0.0.1:50010,这是错误的。我应该在B中安装hadoop吗?如果没有必要,那么正确的配置是什么?谢谢!

1 个答案:

答案 0 :(得分:0)

顺便说一句,万一有人来找到某种解决方案,我通过将quickstart.cloudera指向真正的IP地址而不是127.0.0.1来解决我的问题。 默认的/ etc / hosts是 127.0.0.1 quickstart.cloudera quickstart localhost localhost.domain

你想要的是什么 127.0.0.1 localhost localhost.domain xxxIP_Address_oF_YOUR_VM quickstart.cloudera quickstart

您可能还想修改/ usr / bin / cloudera-quickstart-ip,因为每次重新启动VM时,hosts文件可能会再次重置。