我想使用MapReduce在txt文件中获取每年的最大值和最小值。文件中的内容如下所示:
1979 2, 26
1980 26, 31
...
第一列表示年份。 我希望MapReduce给我这样的最终输出:
public class MaxValue_MinValue {
public static class E_Mappter extends Mapper<Object, Text, Text, IntWritable> {
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
String[] elements = line.split("\\s");
Text outputKey = new Text(elements[0]);
for(int i = 1; i<elements.length;i++) {
context.write(outputKey, new IntWritable(Integer.parseInt(elements[i])));
}
}
}
public static class E_Reducer extends Reducer<Text,IntWritable, Text, Text> {
public void reduce(Text inKey,Iterable<IntWritable> inValues, Context context) throws IOException, InterruptedException {
int maxTemp = 0;
int minTemp = 0;
for(IntWritable ele : inValues) {
if (ele.get() > maxTemp) {
maxTemp = ele.get();
}
if (ele.get() < minTemp) {
minTemp = ele.get();
}
}
context.write(inKey, new Text("Max is " + maxTemp + ", Min is " + minTemp));
}
}
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf,"Max value, min value for each year");
job.setJarByClass(MaxValue_MinValue.class);
job.setMapperClass(E_Mappter.class);
job.setReducerClass(E_Reducer.class);
job.setCombinerClass(E_Reducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0:1);
}
}
所以我这样用Java编写代码:
hadoop@steven81-HP:/usr/local/hadoop277$ ./bin/hadoop jar ./myApp/MinValue_MaxValue.jar /user/hadoop/input/Electrical__Consumption.txt /user/hadoop/output7
19/04/10 16:59:21 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
19/04/10 16:59:21 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
19/04/10 16:59:21 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
19/04/10 16:59:22 INFO input.FileInputFormat: Total input paths to process : 1
19/04/10 16:59:22 INFO mapreduce.JobSubmitter: number of splits:1
19/04/10 16:59:22 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1076320101_0001
19/04/10 16:59:23 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
19/04/10 16:59:23 INFO mapreduce.Job: Running job: job_local1076320101_0001
19/04/10 16:59:23 INFO mapred.LocalJobRunner: OutputCommitter set in config null
19/04/10 16:59:23 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
19/04/10 16:59:23 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
19/04/10 16:59:23 INFO mapred.LocalJobRunner: Waiting for map tasks
19/04/10 16:59:23 INFO mapred.LocalJobRunner: Starting task: attempt_local1076320101_0001_m_000000_0
19/04/10 16:59:23 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
19/04/10 16:59:23 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
19/04/10 16:59:23 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/user/hadoop/input/Electrical__Consumption.txt:0+204
19/04/10 16:59:23 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
19/04/10 16:59:23 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
19/04/10 16:59:23 INFO mapred.MapTask: soft limit at 83886080
19/04/10 16:59:23 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
19/04/10 16:59:23 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
19/04/10 16:59:23 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
19/04/10 16:59:24 INFO mapred.MapTask: Starting flush of map output
19/04/10 16:59:24 INFO mapred.LocalJobRunner: map task executor complete.
19/04/10 16:59:24 INFO mapreduce.Job: Job job_local1076320101_0001 running in uber mode : false
19/04/10 16:59:24 INFO mapreduce.Job: map 0% reduce 0%
19/04/10 16:59:24 WARN mapred.LocalJobRunner: job_local1076320101_0001
java.lang.Exception: java.io.IOException: Type mismatch in value from map: expected org.apache.hadoop.io.Text, received org.apache.hadoop.io.IntWritable
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.io.IOException: Type mismatch in value from map: expected org.apache.hadoop.io.Text, received org.apache.hadoop.io.IntWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1077)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:715)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
at test.map.reduce.MaxValue_MinValue$E_Mappter.map(MaxValue_MinValue.java:23)
at test.map.reduce.MaxValue_MinValue$E_Mappter.map(MaxValue_MinValue.java:1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
19/04/10 16:59:25 INFO mapreduce.Job: Job job_local1076320101_0001 failed with state FAILED due to: NA
19/04/10 16:59:25 INFO mapreduce.Job: Counters: 0
但是当我运行它时,出现以下错误消息:
let language = NSLocale.preferredLanguages[0]
print(language)//en
我对以下错误感到困惑:“地图值类型不匹配:预期的org.apache.hadoop.io.Text,收到了org.apache.hadoop.io.IntWritable”,因为地图的输出是(Text,IntWritable)并且reduce的输入也是(Text,IntWritable),所以我不知道为什么,有人可以帮助我吗?
答案 0 :(得分:1)
Combiner必须能够从Mapper接受数据,并且必须输出可用作Reducer输入的数据。在您的情况下,Combiner输出类型为<Text, Text>
,但Reducer输入类型为<Text, IntWritable>
,因此它们不匹配。
您实际上不需要MapReduce来解决此问题,因为您可以在每一行上获得每年的所有数据,并且不需要在行之间进行比较。
String line = value.toString();
String[] elements = line.split("\\s");
Text year = new Text(elements[0]);
int maxTemp = INTEGER.MIN_VALUE;
int minTemp = INTEGER.MAX_VALUE;
int temp;
for(int i = 1; i<elements.length;i++) {
temp = Integer.parseInt(elements[i])
if (temp < minTemp) {
minTemp = temp;
} else if (temp > maxTemp) {
maxTemp = temp;
}
}
System.out.println("For year " + year + ", the minimum temperature was " + minTemp + " and the maximum temperature was " + maxTemp);