Question

我有一个MR程序，可以在一堆SequenceFile上完美运行，并且输出正如预期的那样。当我尝试通过Oozie WorkFlow实现相同的原因时，由于某种原因无法识别InputFormat类属性，我觉得输入仅被视为默认的TextInputFormat。

以下是映射器的声明方式。 SequenceFile键是LongWritable，值是Text。

public static class FeederCounterMapper extends Mapper<LongWritable, Text, Text, IntWritable>{

    // setup map function for stripping the feeder for a zone from the input
    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException{

        final int count = 1;            

        // convert input rec to string          
        String inRec = value.toString();

        System.out.println("Feeder:" + inRec);

        // strip out the feeder from record
        String feeder = inRec.substring(3, 7);          

        // write the key+value as map output
        context.write(new Text(feeder), new IntWritable(count));
    }
}

我的应用程序的工作流程布局如下

 /{$namenode}/workflow.xml
 /{$namenode}/lib/FeederCounterDriver.jar

以下是我的workflow.xml。 $ namenode，$ jobtracker，$ outputdir，$ inputdir在job.properties文件中定义。

<map-reduce>
    <job-tracker>${jobTracker}</job-tracker>
    <name-node>${nameNode}</name-node>
    <prepare>
    <delete path="${nameNode}/${outputDir}"/>
    </prepare>
  <configuration>
    <property>
        <name>mapred.reducer.new-api</name>
        <value>true</value>
    </property>
    <property>
        <name>mapred.mapper.new-api</name>
        <value>true</value>
    </property>
    <property>
        <name>mapreduce.job.queue.name</name>
        <value>${queueName}</value>
    </property>
    <property>
        <name>mapred.input.dir</name>
        <value>/flume/events/sincal*</value>
    </property>
    <property>
        <name>mapred.output.dir</name>
        <value>${outputDir}</value>
    </property>
    <property>
        <name>mapred.input.format.class</name>
        <value>org.apache.hadoop.mapred.SequenceFileInputFormat</value>
    </property>
    <property>
        <name>mapred.output.format.class</name>
        <value>org.apache.hadoop.mapred.TextOutputFormat</value>
    </property>
    <property>
        <name>mapred.input.key.class</name>
        <value>org.apache.hadoop.io.LongWritable</value>
    </property>
    <property>
        <name>mapred.input.value.class</name>
        <value>org.apache.hadoop.io.Text</value>
    </property>
    <property>
        <name>mapred.output.key.class</name>
        <value>org.apache.hadoop.io.Text</value>
    </property>
    <property>
        <name>mapred.output.value.class</name>
        <value>org.apache.hadoop.io.IntWritable</value>
    </property>
    <property>
        <name>mapreduce.map.class</name>
        <value>org.poc.hadoop121.gissincal.FeederCounterDriver$FeederCounterMapper</value>
    </property>
    <property>
        <name>mapreduce.reduce.class</name>
        <value>org.poc.hadoop121.gissincal.FeederCounterDriver$FeederCounterReducer</value>
    </property>
    <property>
        <name>mapreduce.map.tasks</name>
        <value>1</value>
    </property>                
</configuration>
</map-reduce>

运行MR作业时的粗壮片段（前2行）是

 Feeder:00107371PA1700TEET67576     LKHS  5666LH 2.....           
 Feeder:00107231PA1300TXDS  8731TX 1FSHS  8731FH 1.....

当我使用Ooozie工作流程运行时输出的片段（前3行）是

 Feeder:SEQ!org.apache.hadoop.io.LongWritableorg.apache.hadoop.io.Text�������b'b��X�...
 Feeder:��00105271PA1000FSHS  2255FH 1TXDS  2255TX 1.....
 Feeder:��00103171PA1800LKHS  3192LH 2LKHS  2335LH 1.....

通过Oozie工作流的上述输出，我非常怀疑甚至考虑了workflow.xml中提到的输入格式SequenceFileInputFormat，否则我觉得这被覆盖了。

任何对此的投入都会有所帮助。感谢

Answer 1

在作业跟踪器中找到为此mapreduce作业创建的job.xml，并查看在那里设置的输入格式类。这将确认输入格式是否有问题。

Answer 2

我有一个非常类似的问题，我通过设置我的属性来使用正确的输入格式oozie

<property>
    <name>mapreduce.inputformat.class</name>
    <value>org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat</value>
</property>

从属性名称中删除一个点（检查您的版本）和要更改的类。

在Oozie workflow.xml中无法识别SequenceFile输入格式？

2 个答案: