我们在Spark日志中获得以下内容:
java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage DatanodeInfoWithStorage\
The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.
at org.apache.hadoop.hdfs
.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:1036)
My Ambari群集仅包含3台工作人员计算机,每个工作人员只有一个数据磁盘。
我在Google上搜索过,发现解决方案可以是块复制。 HDFS中的块复制默认配置为3,我发现建议将“块复制”设置为1而不是3。
问题:它有意义吗?
此外,我的工作机器只有一个数据磁盘可以解决这个问题吗?
块复制=文件系统中的文件总数将是dfs.replication因子设置中指定的数量 dfs.replication = 1,表示文件中只有一个文件副本 系统。
完整日志:
java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[34.2.31.31:50010,DS-8234bb39-0fd4-49be-98ba-32080bc24fa9,DISK], DatanodeInfoWithStorage[34.2.31.33:50010,DS-b4758979-52a2-4238-99f0-1b5ec45a7e25,DISK]], original=[DatanodeInfoWithStorage[34.2.31.31:50010,DS-8234bb39-0fd4-49be-98ba-32080bc24fa9,DISK], DatanodeInfoWithStorage[34.2.31.33:50010,DS-b4758979-52a2-4238-99f0-1b5ec45a7e25,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:1036)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1110)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1268)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:993)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:500)
---2018-01-30T15:15:15.015 INFO [][][] [dal.locations.LocationsDataFramesHandler]
答案 0 :(得分:0)
我面临着同样的问题。默认的块复制为3。因此,除非另有指定,否则所有文件的复制系数均为3。
如果任何datanode不可访问(网络问题或磁盘上没有剩余空间),则复制将失败。
使用以下命令检查数据节点状态:
hdfs dfsadmin -report
就我而言,我有2个开发节点,1个是master节点,1个是datanode。因此,我将复制因子更改为1。
您可以首先从hdfs cli中对此进行测试,如下所示:
echo "test file line1" > copy1
echo "test file line2" > copy2
hdfs dfs -Ddfs.replication=1 -touchz /tmp/appendtest.txt
hdfs dfs -appendToFile copy1 /tmp/appendtest.txt
hdfs dfs -appendToFile copy2 /tmp/appendtest.txt
如果在touchz
命令中未指定复制因子,则在尝试附加本地文件copy2
hdfsConfig对象的以下配置为我解决了这个问题:
hdfsConfiguration.set("fs.defaultFS", configuration.getString("hdfs.uri"))
hdfsConfiguration.set("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getName)
hdfsConfiguration.set("fs.file.impl", classOf[org.apache.hadoop.fs.LocalFileSystem].getName)
hdfsConfiguration.set("dfs.support.append", "true")
hdfsConfiguration.set("dfs.replication", "1")