oozie中的MapReduce作业失败

时间:2014-01-04 20:29:51

标签: mapreduce oozie

我有一个只有map的作业,它接受序列文件(键是Text,值是BytesWritable)作为输入和输出数据到序列文件(键是NullWritable,值是Text)。

Java类

import java.io.*;
import java.util.*;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.BytesWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;

public class Test {

    public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {

        Configuration conf = new Configuration();
        Job job = new Job(conf, "Test");

        job.setJarByClass(Test.class);
        job.setMapperClass(TestMapper.class);

        job.setInputFormatClass(SequenceFileInputFormat.class);
        job.setOutputFormatClass(SequenceFileOutputFormat.class);

        job.setOutputKeyClass(NullWritable.class);
        job.setOutputValueClass(Text.class);

        job.setMapOutputKeyClass(NullWritable.class);
        job.setMapOutputValueClass(Text.class);

        job.setNumReduceTasks(0);

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        job.submit();
    }

    public static class TestMapper extends Mapper<Text, BytesWritable, NullWritable, Text> {
        Text outValue = new Text("");
        int counter = 0;
        public void map(Text filename, BytesWritable data, Context context) throws IOException, InterruptedException {
        / logic
              }
    }
}

从unix命令运行作业时工作正常,当oozie中安排的相同作业看到以下错误时

java.lang.ClassCastException:org.apache.hadoop.io.LongWritable无法强制转换为org.apache.hadoop.io.Text 在Test $ TestMapper.map(Test.java:56)

oozie中的作业配置

<configuration>
<property>
<name>mapred.input.dir</name>
<value>${input}</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>/temp</value>
</property>
<property>
<name>mapreduce.map.class</name>
<value>Test$TestMapper</value>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>0</value>
</property>
<property>
<name>mapreduce.job.output.key.class</name>
<value>org.apache.hadoop.io.NullWritable</value>
</property>
<property>
<name>mapreduce.job.output.value.class</name>
<value>org.apache.hadoop.io.Text</value>
</property>
<property>
<name>mapreduce.job.inputformat.class</name>
<value>org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat</value>
</property>
<property>
<name>mapreduce.job.outputformat.class</name>
<value>org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat</value>
</property>
<property>
<name>mapreduce.job.mapinput.key.class</name>
<value>org.apache.hadoop.io.Text</value>
</property>
<property>
<name>mapreduce.job.mapinput.value.class</name>
<value>org.apache.hadoop.io.BytesWritable</value>
</property>
<property>
<name>mapred.reducer.new-api</name>
<value>true</value>
</property>
<property>
<name>mapred.mapper.new-api</name>
<value>true</value>
</property>

有人可以告诉我这里的错误是什么..谢谢

1 个答案:

答案 0 :(得分:1)

classcast异常表明Oozie仍在使用TextInputFormat的默认输入格式,其Key类型为LongWritable。由于映射器具有键类型的Text,因此映射器的摄取类型不匹配。所以mapreduce.job.inputformat.class的配置键不正确。

(经过一些试验和错误)

我们发现正确的属性名称是mapreduce.inputformat.class,即:

<property>
    <name>mapreduce.inputformat.class</name>
    <value>org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat</value>
</property>