AVRO Mapreduce的一些问题

时间:2015-05-03 20:42:18

标签: java hadoop mapreduce avro hadoop2

首先,我通过oozie运行mapreduce作为java动作。运行mapreduce时出现以下错误: java.lang.ClassNotFoundException:Class org.apache.avro.mapreduce.AvroKeyInputFormat

Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.avro.mapreduce.AvroKeyInputFormat not found
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2047)
    at org.apache.hadoop.mapreduce.task.JobContextImpl.getInputFormatClass(JobContextImpl.java:184)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:746)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.ClassNotFoundException: Class org.apache.avro.mapreduce.AvroKeyInputFormat not found
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1953)
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2045)
    ... 8 more

首先,我看到我必须通过libjars方法提供必要的罐子。 之后我很确定所有的jar都可用于我的驱动程序代码。


<action name="run-workflow">
        <delete path='${nameNode}/user/dhruvk/avro_output'/>
      <java-opts>-Dqueue=${queueName} -DinputPath=${nameNode}/user/dhruvk/avro_input -DoutputPath=${nameNode}/user/dhruvk/avro_output</java-opts>
    <ok to="end"/>
    <error to="error"/>


public class AvroDriver extends Configured implements Tool
    public static void main( String[] args ) throws Exception {
      int exitCode = ToolRunner.run(new Configuration(), new AvroDriver(), args);

  public int run(String[] args) throws Exception {
    Configuration configuration = getConf();

    Job job = Job.getInstance(configuration, this.getClass().getSimpleName());

    String inputDir = getProperty("inputPath");
    String outputDir = getProperty("outputPath");

    job.setJobName("Color count");
    FileInputFormat.addInputPath(job, new Path(inputDir));
    FileOutputFormat.setOutputPath(job, new Path(outputDir));

    AvroJob.setInputKeySchema(job, User.getClassSchema());


    return job.waitForCompletion(true) ? 0 : 1;

现在我真的不确定为什么这个类没有这个,因为我从maven程序集插件改变打包为 jar-with-dependencies 的实验。








Apr 30, 2015 3:48:34 PM com.google.inject.servlet.InternalServletModule$BackwardsCompatibleServletContextProvider get
WARNING: You are attempting to use a deprecated API (specifically, attempting to @Inject ServletContext inside an eagerly created singleton. While we allow this for backwards compatibility, be warned that this MAY have unexpected behavior if you have more than one injector (with ServletModule) running in the same JVM. Please consult the Guice documentation at http://code.google.com/p/google-guice/wiki/Servlets for more information.
Apr 30, 2015 3:48:34 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver as a provider class
Apr 30, 2015 3:48:34 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a provider class
Apr 30, 2015 3:48:34 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices as a root resource class
Apr 30, 2015 3:48:34 PM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
INFO: Initiating Jersey application, version 'Jersey: 1.9 09/02/2011 11:17 AM'
Apr 30, 2015 3:48:34 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver to GuiceManagedComponentProvider with the scope "Singleton"
Apr 30, 2015 3:48:35 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.yarn.webapp.GenericExceptionHandler to GuiceManagedComponentProvider with the scope "Singleton"
Apr 30, 2015 3:48:35 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider
INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices to GuiceManagedComponentProvider with the scope "PerRequest"
Error: java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to com.dhruvk.models.User
    at com.dhruvk.ColorCountMapper.map(ColorCountMapper.java:15)
    at com.dhruvk.ColorCountMapper.map(ColorCountMapper.java:12)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)


public class ColorCountMapper extends Mapper<AvroKey<User>, NullWritable, Text, IntWritable> {
  public void map(AvroKey<User> user, NullWritable value, Context context) throws IOException, InterruptedException {
    CharSequence color = user.datum().getFavoriteColor();
    if (color == null) {
      color = "none";
    context.write(new Text(color.toString()), new IntWritable(1));


编辑: 这是用于生成文件的模式。

 "type": "record",
 "name": "User",
 "fields": [
     {"name": "name", "type": "string"},
     {"name": "favorite_number",  "type": ["int", "null"]},
     {"name": "favorite_color", "type": ["string", "null"]}


 * Autogenerated by Avro

package com.dhruvk.models; // Package name added by me.

public class User extends org.apache.avro.specific.SpecificRecordBase implements org.apache.avro.specific.SpecificRecord {
  public static final org.apache.avro.Schema SCHEMA$ = new org.apache.avro.Schema.Parser().parse("{\"type\":\"record\",\"name\":\"User\",\"fields\":[{\"name\":\"name\",\"type\":\"string\"},{\"name\":\"favorite_number\",\"type\":[\"int\",\"null\"]},{\"name\":\"favorite_color\",\"type\":[\"string\",\"null\"]}]}");
  public static org.apache.avro.Schema getClassSchema() { return SCHEMA$; }
  @Deprecated public java.lang.CharSequence name;
  @Deprecated public java.lang.Integer favorite_number;
  @Deprecated public java.lang.CharSequence favorite_color;

   * Default constructor.  Note that this does not initialize fields
   * to their default values from the schema.  If that is desired then
   * one should use <code>newBuilder()</code>. 
  public User() {}

   * All-args constructor.
  public User(java.lang.CharSequence name, java.lang.Integer favorite_number, java.lang.CharSequence favorite_color) {
    this.name = name;
    this.favorite_number = favorite_number;
    this.favorite_color = favorite_color;

  public org.apache.avro.Schema getSchema() { return SCHEMA$; }
  // Used by DatumWriter.  Applications should not call. 
  public java.lang.Object get(int field$) {
    switch (field$) {
    case 0: return name;
    case 1: return favorite_number;
    case 2: return favorite_color;
    default: throw new org.apache.avro.AvroRuntimeException("Bad index");
  // Used by DatumReader.  Applications should not call. 
  public void put(int field$, java.lang.Object value$) {
    switch (field$) {
    case 0: name = (java.lang.CharSequence)value$; break;
    case 1: favorite_number = (java.lang.Integer)value$; break;
    case 2: favorite_color = (java.lang.CharSequence)value$; break;
    default: throw new org.apache.avro.AvroRuntimeException("Bad index");

   * Gets the value of the 'name' field.
  public java.lang.CharSequence getName() {
    return name;

   * Sets the value of the 'name' field.
   * @param value the value to set.
  public void setName(java.lang.CharSequence value) {
    this.name = value;

   * Gets the value of the 'favorite_number' field.
  public java.lang.Integer getFavoriteNumber() {
    return favorite_number;

   * Sets the value of the 'favorite_number' field.
   * @param value the value to set.
  public void setFavoriteNumber(java.lang.Integer value) {
    this.favorite_number = value;

   * Gets the value of the 'favorite_color' field.
  public java.lang.CharSequence getFavoriteColor() {
    return favorite_color;

   * Sets the value of the 'favorite_color' field.
   * @param value the value to set.
  public void setFavoriteColor(java.lang.CharSequence value) {
    this.favorite_color = value;

  /** Creates a new User RecordBuilder */
  public static User.Builder newBuilder() {
    return new User.Builder();

  /** Creates a new User RecordBuilder by copying an existing Builder */
  public static User.Builder newBuilder(User.Builder other) {
    return new User.Builder(other);

  /** Creates a new User RecordBuilder by copying an existing User instance */
  public static User.Builder newBuilder(User other) {
    return new User.Builder(other);

   * RecordBuilder for User instances.
  public static class Builder extends org.apache.avro.specific.SpecificRecordBuilderBase<User>
    implements org.apache.avro.data.RecordBuilder<User> {

    private java.lang.CharSequence name;
    private java.lang.Integer favorite_number;
    private java.lang.CharSequence favorite_color;

    /** Creates a new Builder */
    private Builder() {

    /** Creates a Builder by copying an existing Builder */
    private Builder(User.Builder other) {
      if (isValidValue(fields()[0], other.name)) {
        this.name = data().deepCopy(fields()[0].schema(), other.name);
        fieldSetFlags()[0] = true;
      if (isValidValue(fields()[1], other.favorite_number)) {
        this.favorite_number = data().deepCopy(fields()[1].schema(), other.favorite_number);
        fieldSetFlags()[1] = true;
      if (isValidValue(fields()[2], other.favorite_color)) {
        this.favorite_color = data().deepCopy(fields()[2].schema(), other.favorite_color);
        fieldSetFlags()[2] = true;

    /** Creates a Builder by copying an existing User instance */
    private Builder(User other) {
      if (isValidValue(fields()[0], other.name)) {
        this.name = data().deepCopy(fields()[0].schema(), other.name);
        fieldSetFlags()[0] = true;
      if (isValidValue(fields()[1], other.favorite_number)) {
        this.favorite_number = data().deepCopy(fields()[1].schema(), other.favorite_number);
        fieldSetFlags()[1] = true;
      if (isValidValue(fields()[2], other.favorite_color)) {
        this.favorite_color = data().deepCopy(fields()[2].schema(), other.favorite_color);
        fieldSetFlags()[2] = true;

    /** Gets the value of the 'name' field */
    public java.lang.CharSequence getName() {
      return name;

    /** Sets the value of the 'name' field */
    public User.Builder setName(java.lang.CharSequence value) {
      validate(fields()[0], value);
      this.name = value;
      fieldSetFlags()[0] = true;
      return this; 

    /** Checks whether the 'name' field has been set */
    public boolean hasName() {
      return fieldSetFlags()[0];

    /** Clears the value of the 'name' field */
    public User.Builder clearName() {
      name = null;
      fieldSetFlags()[0] = false;
      return this;

    /** Gets the value of the 'favorite_number' field */
    public java.lang.Integer getFavoriteNumber() {
      return favorite_number;

    /** Sets the value of the 'favorite_number' field */
    public User.Builder setFavoriteNumber(java.lang.Integer value) {
      validate(fields()[1], value);
      this.favorite_number = value;
      fieldSetFlags()[1] = true;
      return this; 

    /** Checks whether the 'favorite_number' field has been set */
    public boolean hasFavoriteNumber() {
      return fieldSetFlags()[1];

    /** Clears the value of the 'favorite_number' field */
    public User.Builder clearFavoriteNumber() {
      favorite_number = null;
      fieldSetFlags()[1] = false;
      return this;

    /** Gets the value of the 'favorite_color' field */
    public java.lang.CharSequence getFavoriteColor() {
      return favorite_color;

    /** Sets the value of the 'favorite_color' field */
    public User.Builder setFavoriteColor(java.lang.CharSequence value) {
      validate(fields()[2], value);
      this.favorite_color = value;
      fieldSetFlags()[2] = true;
      return this; 

    /** Checks whether the 'favorite_color' field has been set */
    public boolean hasFavoriteColor() {
      return fieldSetFlags()[2];

    /** Clears the value of the 'favorite_color' field */
    public User.Builder clearFavoriteColor() {
      favorite_color = null;
      fieldSetFlags()[2] = false;
      return this;

    public User build() {
      try {
        User record = new User();
        record.name = fieldSetFlags()[0] ? this.name : (java.lang.CharSequence) defaultValue(fields()[0]);
        record.favorite_number = fieldSetFlags()[1] ? this.favorite_number : (java.lang.Integer) defaultValue(fields()[1]);
        record.favorite_color = fieldSetFlags()[2] ? this.favorite_color : (java.lang.CharSequence) defaultValue(fields()[2]);
        return record;
      } catch (Exception e) {
        throw new org.apache.avro.AvroRuntimeException(e);

0 个答案:
