将数据写入每个数据节点中的本地磁盘

时间:2016-05-17 08:47:41

标签: hadoop caching mapreduce hadoop2

我想将map任务中的一些值存储到每个数据节点的本地磁盘中。例如,

public void map (...) {
   //Process
   List<Object> cache = new ArrayList<Object>();
   //Add value to cache
   //Serialize cache to local file in this data node
}

如何将此缓存对象存储到每个数据节点中的本地磁盘,因为如果我将此缓存存储在上面的map函数中,那么由于I / O任务,性能会很糟糕?

我的意思是有没有办法等待这个数据节点中的map任务完全运行然后我们将这个缓存存储到本地磁盘?或者Hadoop是否有解决此问题的功能?

1 个答案:

答案 0 :(得分:2)

请参阅下面的示例,创建的文件将位于NodeManager用于容器的目录下。这是yarn-site.xml中的配置属性yarn.nodemanager.local-dirs,或者是继承自/tmp

下的yarn-default.xml的默认值

Please see @Chris Nauroth answer, Which says that Its just for debugging purpose and It's not recommended as a permanent production configuration. It was clearly described why it was not recommended.

public void map(Object key, Text value, Context context)
        throws IOException, InterruptedException {
    // do some hadoop stuff, like counting words
    String path = "newFile.txt";
    try {
        File f = new File(path);
        f.createNewFile();
    } catch (IOException e) {
        System.out.println("Message easy to look up in the logs.");
        System.err.println("Error easy to look up in the logs.");
        e.printStackTrace();
        throw e;
    }
}