我想计算文档集合的平均文档长度,每个文档有3个不同的字段(filed1,field2,field3)
当只有一个字段存在时,这是计算平均长度的程序。
private byte[] normsDocLengthArr = null;
private double avgDocLength;
normsDocLengthArr = indexReader.norms("filed1");
//norms-Returns the byte-encoded normalization factor for the named field of every document.
double sumLength = 0;
for (int i = 0; i < normsDocLengthArr.length; i++) {
double encodeLength = DefaultSimilarity.decodeNorm(normsDocLengthArr[i]);
//decodeNorm -Decodes a normalization factor stored in an index.
double length = 1 / (encodeLength * encodeLength);
sumLength += length;
}
this.avgDocLength = sumLength / normsDocLengthArr.length;
这是我为所有3个字段扩展它的方式。
private byte[] normsDocLengthArrField1 = null;
private byte[] normsDocLengthArrField2 = null;
private byte[] normsDocLengthArrField3 = null;
private double avgDocLength;
normsDocLengthArrField1 = indexReader.norms("filed1");
normsDocLengthArrField2 = indexReader.norms("filed2");
normsDocLengthArrField3 = indexReader.norms("filed3");
//norms-Returns the byte-encoded normalization factor for the named field of every document.
double sumLength = 0;
for (int i = 0; i < normsDocLengthArrField1.length; i++) {
double encodeLengthF1 = DefaultSimilarity.decodeNorm(normsDocLengthArrField1[i]);
double encodeLengthF2 = DefaultSimilarity.decodeNorm(normsDocLengthArrField2[i]);
double encodeLengthF3 = DefaultSimilarity.decodeNorm(normsDocLengthArrField3[i]);
//decodeNorm -Decodes a normalization factor stored in an index.
double length = 1 / {(encodeLengthF1 * encodeLengthF1)+(encodeLengthF2 * encodeLengthF2)+(encodeLengthF3 * encodeLengthF3)};
sumLength += length;
}
this.avgDocLength = sumLength / (normsDocLengthArrField1.length+ normsDocLengthArrField2.length+normsDocLengthArrField3.length;
我只是想知道我计算3字段的Doc平均长度的实现是否正确?
答案 0 :(得分:0)
我发现这种方式是计算平均文档长度的正确方法,哪个文档有lucene的3个字段。
byte[] normsDocLengthArrField1 = indexReader.norms("filed1");
byte[] normsDocLengthArrField2 = indexReader.norms("filed2");
byte[] normsDocLengthArrField3 = indexReader.norms("filed3");
double sumLength = 0;
for (int i = 0; i < normsDocLengthArrField1.length; i++) {
double encodeLengthFOne = DefaultSimilarity.decodeNorm(normsDocLengthArrField1[i]);
double encodeLengthFTwo = DefaultSimilarity.decodeNorm(normsDocLengthArrField2[i]);
double encodeLengthFThree = DefaultSimilarity.decodeNorm(normsDocLengthArrField3[i]);
//decodeNorm -Decodes a normalization factor stored in an index.
double lengthFieldOne = 1 / (encodeLengthFOne * encodeLengthFOne);
double lengthFieldTwo = 1 / (encodeLengthFTwo * encodeLengthFTwo);
double lengthFieldThree = 1 / (encodeLengthFThree * encodeLengthFThree);
sumLength += lengthFieldOne + lengthFieldTwo + lengthFieldThree;
}
this.avgDocLength = sumLength / (normsDocLengthArrField1.length);