我有一个5GB的文件。我正在运行一个简单的地图减少字数统计工作。块大小为128 MB。它是一个1节点群集。看完120万份报告。它再次从同一文件的开头开始读取。 sudo代码在下面。
Configuration objconf = new Configuration()
Path objInputPath = new Path("/home/abc/Desktop/Debug.csv")
Path objoutPath = new Path("/home/abc/Desktop/Outpath.csv")
Job objJob = new Job(objconf, "WordCount")
FileInputFormat.setInputPaths(objJob, objInputPath)
FileOutputFormat.setOutputPath(objJob, objoutPath)
objJob.setJarByClass(WordCount.class)
objJob.setMapperClass(WCMapper.class)
objJob.setJobName("WordCount")
objJob.setInputFormatClass(TextInputFormat.class)
objJob.setOutputFormatClass(TextOutputFormat.class)
int j = objJob.waitForCompletion(true) ? 0 : 1
Mapper.java
private IntWritable one = new IntWritable(1)
private Text word = new Text()
String line = value.toString()