对象应该实现Writable
接口,以便在Hadoop中传输时进行序列化。以Lucene ScoreDoc
类为例:
public class ScoreDoc implements java.io.Serializable {
/** The score of this document for the query. */
public float score;
/** Expert: A hit document's number.
* @see Searcher#doc(int) */
public int doc;
/** Only set by {@link TopDocs#merge} */
public int shardIndex;
/** Constructs a ScoreDoc. */
public ScoreDoc(int doc, float score) {
this(doc, score, -1);
}
/** Constructs a ScoreDoc. */
public ScoreDoc(int doc, float score, int shardIndex) {
this.doc = doc;
this.score = score;
this.shardIndex = shardIndex;
}
// A convenience method for debugging.
@Override
public String toString() {
return "doc=" + doc + " score=" + score + " shardIndex=" + shardIndex;
}
}
我应该如何使用Writable
界面对其进行序列化? Writable
和java.io.serializable
界面之间的联系是什么?
答案 0 :(得分:1)
我认为篡改内置的Lucene课程并不是一个好主意。相反,拥有自己的类可以包含ScoreDoc类型的字段,并在接口中实现Hadoop可写。它会是这样的:
public class MyScoreDoc implements Writable {
private ScoreDoc sd;
public void write(DataOutput out) throws IOException {
String [] splits = sd.toString().split(" ");
// get the score value from the string
Float score = Float.parseFloat((splits[0].split("="))[1]);
// do the same for doc and shardIndex fields
// ....
out.writeInt(score);
out.writeInt(doc);
out.writeInt(shardIndex);
}
public void readFields(DataInput in) throws IOException {
float score = in.readInt();
int doc = in.readInt();
int shardIndex = in.readInt();
sd = new ScoreDoc (score, doc, shardIndex);
}
//String toString()
}
答案 1 :(得分:0)
首先看{4}你可以使用Java序列化OR
请参阅Hadoop: Easy way to have object as output value without Writable interface您需要自己编写和读取函数,它非常简单,因为内部可以调用API来读写int,flaot,string等
您的可写示例(需要导入)
public class ScoreDoc implements java.io.Serializable, Writable {
/** The score of this document for the query. */
public float score;//... as in above
public void write(DataOutput out) throws IOException {
out.writeInt(score);
out.writeInt(doc);
out.writeInt(shardIndex);
}
public void readFields(DataInput in) throws IOException {
score = in.readInt();
doc = in.readInt();
shardIndex = in.readInt();
}
//rest toStirng etc
}
注意:写入和读取的顺序应该相同,或者一个的值将转到另一个,如果您有不同的类型,则会在读取时出现序列化错误