使用Lucene 4.9进行简单测试。 使用RamDirectory索引两个文件,宽度为3个文件[longdata,stringdata,textdata]。
文件
[2000000L,“你好g”,“你好g”] [4000000L,“世界”,“世界”]
这是我的代码
public static void main(String[] args) throws IOException {
Directory directory = null;
IndexWriter iwriter = null;
Analyzer analyzer = new SmartChineseAnalyzer(Version.LUCENE_4_9);
directory = new RAMDirectory();
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_4_9, analyzer);
iwriter = new IndexWriter(directory, config);
Document doc = new Document();
doc.add(new LongField("longdata", 2000000L, Field.Store.YES));
doc.add(new LongField("longdata", 4000000L, Field.Store.YES));
doc.add(new StringField("stringdata", "hello g", Field.Store.YES));
doc.add(new StringField("stringdata", "world", Field.Store.YES));
doc.add(new TextField("textdata", "hello g", Field.Store.YES));
doc.add(new TextField("textdata", "world", Field.Store.YES));
iwriter.addDocument(doc);
iwriter.close();
DirectoryReader ireader = DirectoryReader.open(directory);
Fields fields = MultiFields.getFields(ireader);
System.out.println("longdata========");
Terms terms = fields.terms("longdata");
TermsEnum iterator = terms.iterator(null);
BytesRef byteRef = null;
while ((byteRef = iterator.next()) != null) {
System.out.println(NumericUtils.prefixCodedToLong(byteRef));
}
System.out.println("stringdata========");
Terms strterms = fields.terms("stringdata");
TermsEnum striterator = strterms.iterator(null);
BytesRef strbyteRef = null;
while ((strbyteRef = striterator.next()) != null) {
System.out.println(strbyteRef.utf8ToString());
}
System.out.println("textdata========");
Terms textterms = fields.terms("textdata");
TermsEnum textiterator = textterms.iterator(null);
BytesRef textbyteRef = null;
while ((textbyteRef = textiterator.next()) != null) {
System.out.println(textbyteRef.utf8ToString());
}
ireader.close();
directory.close();
}
它是OUTPUT
longdata========
2000000
4000000
1966080
3997696
0
0
stringdata========
hello g
world
textdata========
g
hello
world
我的问题是为什么长期有这么多?
答案 0 :(得分:1)
Lucene将数字字段编入括号中的较低和较低精度增量(由精确步骤控制),使其更有效(并且快速)回归正确匹配。
如果你看看这些数字的二进制表示,它会变得更加明显:
4000000 = 0b1111010000100100000000
3997696 = 0b1111010000000000000000
2000000 = 0b111101000010010000000
1966080 = 0b111100000000000000000