Question

我想比较2个文件系统。一个人的数据很少，另一个人的数据更多。我决定用HDFS做到这一点，我已经在HDFS上下载了这两个。

经过一些研究，我发现我可以在我的地图减少例程中使用CacheFile，我做了。现在我不知道管理数据文件和迭代这两者的最佳方法。

只需要一些想法来正确实现它并具有良好的性能。

Hier是地图代码：

public static class DataPreparationMapper extends Mapper<LongWritable, Text, Text, Text> {
        private URI file;
        @Override
        protected void setup(Mapper<LongWritable, Text, Text, Text>.Context context)
                throws IOException, InterruptedException {
            this.file = context.getCacheFiles()[0];
//          super.setup(context);
        }

        @Override
        protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, Text>.Context context)
                throws IOException, InterruptedException {

                context.write(new Text(this.file.toString()), new Text("File found!")); // Station-Datum als Key und Value = 1
//          super.map(key, value, context);
        }
    }

注意：输出，是检查我是否可以读取缓存数据的路径。所以代码还没有做任何富有成效的事情。

Hadoop：比较2个不同的文件系统

0 个答案: