job.addCacheFile抛出FileNotFound错误

时间:2020-04-30 00:52:00

标签: hadoop mapreduce

我正在尝试使用job.addCacheFile将文件添加到MapReduce中的分布式缓存中以进行地图侧连接,但是这引发了FileNotFound错误。我查看了一些类似的问题,但没有一个适合我的情况。这是我使用hadoop 2.6.5

做的

在Driver类中

Configuration conf = super.getConf();

// absolute path on HDFS
// not sure if relative path or absolute path matters here
Path fileToBeCached = new Path("/test-data/cacheFiles");
d
Job job = Job.getInstance(conf);
output.getFileSystem(conf).delete(output, true);

FileSystem fs = fileToBeCached.getFileSystem(conf);
FileStatus filesStatus = fs.getFileStatus(fileToBeCached);

if (filesStatus.isDirectory()) {
    for (FileStatus f : fs.listStatus(fileToBeCached)) {
        if (f.getPath().getName().startsWith("part")) {
            job.addCacheFile(f.getPath().toUri());
        }
    }
} else {
    job.addCacheFile(fileToBeCached.toUri());
}

在Mapper类中:

public static class Map extends Mapper<Text, Text, Text, Text> {
    private Set<String> recordSet = new HashSet<String>();

    @Override
    protected void setup(Context context) throws IOException, InterruptedException {
        URI[] files = context.getCacheFiles();
        if (files.length > 0) {
            for (URI uri : files) {
                System.out.println("Cached file: " + uri);
                File path = new File(uri.getPath());
                loadCache(path);
            }
        }
    }

    private void loadCache(File file) throws IOException {
        recordSet.addAll(FileUtils.readLines(file));
    }
}

0 个答案:

没有答案
相关问题