如何在Hadoop中序列化Java对象?

时间:2013-05-30 13:32:22

标签: java serialization hadoop

对象应该实现Writable接口,以便在Hadoop中传输时进行序列化。以Lucene ScoreDoc类为例:

public class ScoreDoc implements java.io.Serializable {

  /** The score of this document for the query. */
  public float score;

  /** Expert: A hit document's number.
   * @see Searcher#doc(int) */
  public int doc;

  /** Only set by {@link TopDocs#merge} */
  public int shardIndex;

  /** Constructs a ScoreDoc. */
  public ScoreDoc(int doc, float score) {
    this(doc, score, -1);
  }

  /** Constructs a ScoreDoc. */
  public ScoreDoc(int doc, float score, int shardIndex) {
    this.doc = doc;
    this.score = score;
    this.shardIndex = shardIndex;
  }

  // A convenience method for debugging.
  @Override
  public String toString() {
    return "doc=" + doc + " score=" + score + " shardIndex=" + shardIndex;
  }
}

我应该如何使用Writable界面对其进行序列化? Writablejava.io.serializable界面之间的联系是什么?

2 个答案:

答案 0 :(得分:1)

我认为篡改内置的Lucene课程并不是一个好主意。相反,拥有自己的类可以包含ScoreDoc类型的字段,并在接口中实现Hadoop可写。它会是这样的:

public class MyScoreDoc implements Writable  {      

  private ScoreDoc sd;

  public void write(DataOutput out) throws IOException {
      String [] splits = sd.toString().split(" ");

      // get the score value from the string
      Float score = Float.parseFloat((splits[0].split("="))[1]);

      // do the same for doc and shardIndex fields
      // ....    

      out.writeInt(score);
      out.writeInt(doc);
      out.writeInt(shardIndex);
  }

  public void readFields(DataInput in) throws IOException {
      float score = in.readInt();
      int doc = in.readInt();
      int shardIndex = in.readInt();

      sd = new ScoreDoc (score, doc, shardIndex);
  }

  //String toString()
}

答案 1 :(得分:0)

首先看{4}你可以使用Java序列化OR

请参阅Hadoop: Easy way to have object as output value without Writable interface您需要自己编写和读取函数,它非常简单,因为内部可以调用API来读写int,flaot,string等

您的可写示例(需要导入)

public class ScoreDoc implements java.io.Serializable, Writable  {      
    /** The score of this document for the query. */
    public float score;//... as in above

  public void write(DataOutput out) throws IOException {
      out.writeInt(score);
      out.writeInt(doc);
      out.writeInt(shardIndex);
  }

  public void readFields(DataInput in) throws IOException {
      score = in.readInt();
      doc = in.readInt();
      shardIndex = in.readInt();    
  }

  //rest toStirng etc
}

注意:写入和读取的顺序应该相同,或者一个的值将转到另一个,如果您有不同的类型,则会在读取时出现序列化错误