Lucene迁移文本字段的区别在3.0.3和5

时间:2015-07-21 20:10:36

标签: java lucene migration

我将Lucene字段从3.0.3版迁移到5.x时遇到问题。我准备了两个JUnit测试程序(一个用3.0.3,另一个用5.x)来比较行为。

Lucene 3:

analyzer = new StandardAnalyzer(Version.LUCENE_30);
indexWriter = new IndexWriter(dir, analyzer, true, MaxFieldLength.UNLIMITED);
....
Document doc = new Document();
doc.add(new Field("keyword", "another test@foo-bar", Field.Store.YES,
            Field.Index.ANALYZED));
indexWriter.addDocument(doc);
indexWriter.commit();
....

indexReader = IndexReader.open(FSDirectory.open(path.toFile()), false);
searcher = new IndexSearcher(indexReader);
QueryParser parser = new QueryParser(Version.LUCENE_30, "keyword", analyzer);
Query query = parser.parse("test");
searcher.search(query, searcher.maxDoc());
TopDocs topDocs = searcher.search(query, searcher.maxDoc());
ScoreDoc[] hits = topDocs.scoreDocs;
doc = indexReader.document(hits[0].doc);
// doc is now NULL <- EXPECTED
assertNull(result);

与Lucene 5.x类似的测试(仅更改了代码行):

analyzer = new StandardAnalyzer();
IndexWriterConfig indexConfig = new IndexWriterConfig(analyzer)
            .setCommitOnClose(true).setOpenMode(openMode);
// create the index writer
indexWriter = new IndexWriter(dir, indexConfig);
...
// line old style (Lucene 3)
doc.add(new Field("keyword", "another test@foo-bar", Field.Store.YES,
            Field.Index.ANALYZED));
// or with new field types (enable only one line)
doc.add(new TextField("keyword", "another test@foo-bar", Field.Store.YES));
...
Query query = new QueryParser(field, analyzer).parse(field + ":"
                + value);
doc = indexReader.document(hits[0].doc);
// returns a document each time
assertNull(doc); // fails!

我使用以下迁移文档https://lucene.apache.org/core/4_8_0/MIGRATE.html将Text类替换为TextField类。但搜索结果不同。

问题:如何使用新Lucene 5.x与Lucene 3一样创建相同的结果?

Lucene 3分析器似乎只在空格上分割输入字符串。 Lucene 5版本的分析仪似乎在空间上分开,'@'和' - '。 :/

0 个答案:

没有答案