在Oozie中定义Avro密钥的模式

时间:2014-02-12 22:51:51

标签: hadoop mapreduce oozie avro

我是地图reduce和Avro的新手。我的项目基本上只有mapper函数,它接收Text数据并输出Avro数据,为此我已经声明了我的mapper:

public class AvroMapper extends Mapper(LongWritable, Text, AvroKey<CharSequence>, NullWritable)

我在为Oozie工作流程设置密钥架构时遇到了麻烦。我的Oozie文件配置是:

<property>
    <name>mapred.output.key.class</name>
    <value>org.apache.avro.mapred.NullWriatable</value>
</property>
<property>
    <name>mapred.mapoutput.key.class</name>
    <value>org.apache.avro.mapred.AvroKey</value>
</property>
<property>
    <name>mapred.mapoutput.value.class</name>
    <value>org.apache.avro.mapred.NullWritable</value>
</property>
<property>
<name>mapred.output.key.comparator.class</name>
<value>org.apache.avro.mapred.AvroKeyComparator</value>
</property>
<property>
     <name>avro.schema.output.key</name>
     <value>{my JSON schema}</value>
</property>
<property>
 <name>mapreduce.inputformat.class</name>
 <value>org.apache.hadoop.mapreduce.lib.input.TextInputFormat</value>
 </property>
 <property>
   <name>mapreduce.outputformat.class</name>
       <value>org.apache.avro.mapreduce.AvroKeyOutputFormat</value>
  </property>

但仍然会抛出:

java.lang.NullPointerException
at org.apache.avro.mapred.Pair.getKeySchema(Pair.java:68)
at org.apache.avro.mapred.AvroKeyComparator.setConf(AvroKeyComparator.java:39)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.JobConf.getOutputKeyComparator(JobConf.java:818)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:836)
at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:376)
at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:85)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:584)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:656)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.ha...

请指导我哪里出错了。

1 个答案:

答案 0 :(得分:1)

请改用AvroMapperAvroReducer课程。这种方式对我来说更容易。请记住在这种情况下使用Pair类和模式。

无论如何,Avro的Oozie配置并非易事。为了节省您一些时间,我的配置是AvroMapper和AvroReducer:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<configuration>
    <property>
        <name>avro.input.schema</name>
        <value>{"type":"record","name":"Pair","namespace":"org.apache.avro.mapred","fields":[... your fields ...]}</value>
    </property>
    <property>
        <name>avro.output.schema</name>
        <value>{"type":"record","name":"Pair","namespace":"org.apache.avro.mapred","fields":[... your fields ...]}</value>
    </property>
    <property>
        <name>avro.mapper</name>
        <value>your.mapper.class.Name</value>
    </property>
    <property>
        <name>avro.reducer</name>
        <value>your.reducer.class.Name</value>
    </property>
    <property>
        <name>mapred.output.key.comparator.class</name>
        <value>org.apache.avro.mapred.AvroKeyComparator</value>
    </property>
    <property>
        <name>mapred.reducer.class</name>
        <value>org.apache.avro.mapred.HadoopReducer</value>
    </property>
    <property>
        <name>mapred.output.format.class</name>
        <value>org.apache.avro.mapred.AvroOutputFormat</value>
    </property>
    <property>
        <name>mapred.mapper.class</name>
        <value>org.apache.avro.mapred.HadoopMapper</value>
    </property>
    <property>
        <name>mapred.input.format.class</name>
        <value>org.apache.avro.mapred.AvroInputFormat</value>
    </property>
    <property>
        <name>mapred.output.key.class</name>
        <value>org.apache.avro.mapred.AvroWrapper</value>
    </property>
    <property>
        <name>mapred.mapoutput.value.class</name>
        <value>org.apache.avro.mapred.AvroValue</value>
    </property>
    <property>
        <name>io.serializations</name>
        <value>org.apache.hadoop.io.serializer.WritableSerialization,org.apache.avro.mapred.AvroSerialization</value>
    </property>
    <property>
        <name>mapred.mapoutput.key.class</name>
        <value>org.apache.avro.mapred.AvroKey</value>
    </property>
</configuration>