我是地图reduce和Avro的新手。我的项目基本上只有mapper函数,它接收Text数据并输出Avro数据,为此我已经声明了我的mapper:
public class AvroMapper extends Mapper(LongWritable, Text, AvroKey<CharSequence>, NullWritable)
我在为Oozie工作流程设置密钥架构时遇到了麻烦。我的Oozie文件配置是:
<property>
<name>mapred.output.key.class</name>
<value>org.apache.avro.mapred.NullWriatable</value>
</property>
<property>
<name>mapred.mapoutput.key.class</name>
<value>org.apache.avro.mapred.AvroKey</value>
</property>
<property>
<name>mapred.mapoutput.value.class</name>
<value>org.apache.avro.mapred.NullWritable</value>
</property>
<property>
<name>mapred.output.key.comparator.class</name>
<value>org.apache.avro.mapred.AvroKeyComparator</value>
</property>
<property>
<name>avro.schema.output.key</name>
<value>{my JSON schema}</value>
</property>
<property>
<name>mapreduce.inputformat.class</name>
<value>org.apache.hadoop.mapreduce.lib.input.TextInputFormat</value>
</property>
<property>
<name>mapreduce.outputformat.class</name>
<value>org.apache.avro.mapreduce.AvroKeyOutputFormat</value>
</property>
但仍然会抛出:
java.lang.NullPointerException
at org.apache.avro.mapred.Pair.getKeySchema(Pair.java:68)
at org.apache.avro.mapred.AvroKeyComparator.setConf(AvroKeyComparator.java:39)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.JobConf.getOutputKeyComparator(JobConf.java:818)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:836)
at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:376)
at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:85)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:584)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:656)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.ha...
请指导我哪里出错了。
答案 0 :(得分:1)
请改用AvroMapper和AvroReducer课程。这种方式对我来说更容易。请记住在这种情况下使用Pair类和模式。
无论如何,Avro的Oozie配置并非易事。为了节省您一些时间,我的配置是AvroMapper和AvroReducer:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<configuration>
<property>
<name>avro.input.schema</name>
<value>{"type":"record","name":"Pair","namespace":"org.apache.avro.mapred","fields":[... your fields ...]}</value>
</property>
<property>
<name>avro.output.schema</name>
<value>{"type":"record","name":"Pair","namespace":"org.apache.avro.mapred","fields":[... your fields ...]}</value>
</property>
<property>
<name>avro.mapper</name>
<value>your.mapper.class.Name</value>
</property>
<property>
<name>avro.reducer</name>
<value>your.reducer.class.Name</value>
</property>
<property>
<name>mapred.output.key.comparator.class</name>
<value>org.apache.avro.mapred.AvroKeyComparator</value>
</property>
<property>
<name>mapred.reducer.class</name>
<value>org.apache.avro.mapred.HadoopReducer</value>
</property>
<property>
<name>mapred.output.format.class</name>
<value>org.apache.avro.mapred.AvroOutputFormat</value>
</property>
<property>
<name>mapred.mapper.class</name>
<value>org.apache.avro.mapred.HadoopMapper</value>
</property>
<property>
<name>mapred.input.format.class</name>
<value>org.apache.avro.mapred.AvroInputFormat</value>
</property>
<property>
<name>mapred.output.key.class</name>
<value>org.apache.avro.mapred.AvroWrapper</value>
</property>
<property>
<name>mapred.mapoutput.value.class</name>
<value>org.apache.avro.mapred.AvroValue</value>
</property>
<property>
<name>io.serializations</name>
<value>org.apache.hadoop.io.serializer.WritableSerialization,org.apache.avro.mapred.AvroSerialization</value>
</property>
<property>
<name>mapred.mapoutput.key.class</name>
<value>org.apache.avro.mapred.AvroKey</value>
</property>
</configuration>