Question

使用Lucene 4.9进行简单测试。使用RamDirectory索引两个文件，宽度为3个文件[longdata，stringdata，textdata]。

文件

[2000000L，“你好g”，“你好g”] [4000000L，“世界”，“世界”]

这是我的代码

public static void main(String[] args) throws IOException {
    Directory directory = null;
    IndexWriter iwriter = null;
    Analyzer analyzer = new SmartChineseAnalyzer(Version.LUCENE_4_9);
    directory = new RAMDirectory();
    IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_4_9, analyzer);
    iwriter = new IndexWriter(directory, config);
    Document doc = new Document();
    doc.add(new LongField("longdata", 2000000L, Field.Store.YES));
    doc.add(new LongField("longdata", 4000000L, Field.Store.YES));
    doc.add(new StringField("stringdata", "hello g", Field.Store.YES));
    doc.add(new StringField("stringdata", "world", Field.Store.YES));
    doc.add(new TextField("textdata", "hello g", Field.Store.YES));
    doc.add(new TextField("textdata", "world", Field.Store.YES));
    iwriter.addDocument(doc);
    iwriter.close();

    DirectoryReader ireader = DirectoryReader.open(directory);
    Fields fields = MultiFields.getFields(ireader);
    System.out.println("longdata========");
    Terms terms = fields.terms("longdata");
    TermsEnum iterator = terms.iterator(null);
    BytesRef byteRef = null;
    while ((byteRef = iterator.next()) != null) {
        System.out.println(NumericUtils.prefixCodedToLong(byteRef));
    }
    System.out.println("stringdata========");
    Terms strterms = fields.terms("stringdata");
    TermsEnum striterator = strterms.iterator(null);
    BytesRef strbyteRef = null;
    while ((strbyteRef = striterator.next()) != null) {
        System.out.println(strbyteRef.utf8ToString());
    }
    System.out.println("textdata========");
    Terms textterms = fields.terms("textdata");
    TermsEnum textiterator = textterms.iterator(null);
    BytesRef textbyteRef = null;
    while ((textbyteRef = textiterator.next()) != null) {
        System.out.println(textbyteRef.utf8ToString());
    }

    ireader.close();
    directory.close();
}

它是OUTPUT

longdata========
2000000
4000000
1966080
3997696
0
0
stringdata========
hello g
world
textdata========
g
hello
world

我的问题是为什么长期有这么多？

Answer 1

Lucene将数字字段编入括号中的较低和较低精度增量（由精确步骤控制），使其更有效（并且快速）回归正确匹配。

如果你看看这些数字的二进制表示，它会变得更加明显：

4000000 = 0b1111010000100100000000
3997696 = 0b1111010000000000000000

2000000 = 0b111101000010010000000
1966080 = 0b111100000000000000000

为什么长场的条款不正确？

1 个答案: