Question

我使用Lucene将关键字与应用程序中的单词列表进行匹配。整个过程自动化，无需任何人为干预。从Lucene返回的结果列表中选择最匹配的结果（最高和最高分数）。

以下代码演示了上述功能，结果将打印在控制台上。

问题：

问题在于lucene搜索关键字（要搜索的单词）并作为结果给出与关键字部分匹配的单词。另一方面，完全匹配的结果也存在，并且没有排在第一位。

例如，如果我的lucene RAM索引包含单词'Test'和'Test Engineer'。如果我想搜索“AB4_Test Eng_AA0XY11”的索引，那么结果将是

测试
测试工程师

虽然Eng''AB4_Test Eng_AA0XY11'与工程师匹配（这就是它在结果中列出的原因）。但它没有获得最高位置。我想优化我的解决方案，将“测试工程师”放在首位，因为它是考虑整个关键字的最佳匹配。任何人都可以帮我解决这个问题吗？

public class LuceneTest {

private static void search(Set<String> keywords) {

    StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_36);
    try {
        // 1. create the index
        Directory luceneIndex = buildLuceneIndex(analyzer);

        int hitsPerPage = 5;
        IndexReader reader = IndexReader.open(luceneIndex);

        for(String keyword : keywords) {

            // Create query string. replace all underscore, hyphen, comma, ( , ), {, }, . with plus sign
            StringBuilder querystr = new StringBuilder(128);
            String [] splitName = keyword.split("[\\-_,/(){}:. ]");

            // After tokenizing also add plus sign between each camel case word. 
            for (String token : splitName) {
                querystr.append(token + "+");
            }

            // 3. search
            IndexSearcher searcher = new IndexSearcher(reader);
            TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);

            Query q = new QueryParser(Version.LUCENE_36, "name", analyzer).parse(querystr.toString());
            searcher.search(q, collector);
            ScoreDoc[] hits = collector.topDocs().scoreDocs;

            System.out.println();
            System.out.println(keyword);
            System.out.println("----------------------");
            for (ScoreDoc scoreDoc : hits) {
                Document d = searcher.doc(scoreDoc.doc);
                System.out.println("Found " + d.get("id") +  " : " + d.get("name"));
            }

            // searcher can only be closed when there
            searcher.close();
        }

    }catch (Exception e) {
        e.printStackTrace();
    }
}

/**
 * 
 */
private static Directory buildLuceneIndex(Analyzer analyzer) throws CorruptIndexException, LockObtainFailedException, IOException{

    Map<Integer, String> map = new HashMap<Integer, String>();
    map.put(1, "Test Engineer");
    map.put(2, "Test");

    Directory index = new RAMDirectory();
    IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_36, analyzer);

    // 1. create the index
    IndexWriter w = new IndexWriter(index, config);
    for (Map.Entry<Integer, String> entry : map.entrySet()) {
        try {
            Document doc = new Document();
            doc.add(new Field("id", entry.getKey().toString(), Field.Store.YES, Field.Index.ANALYZED));
            doc.add(new Field("name", entry.getValue() , Field.Store.YES, Field.Index.ANALYZED));
            w.addDocument(doc);

        }catch (Exception e) {
            e.printStackTrace();
        }
    }

    w.close();

    return index;
}


public static void main(String[] args) {

    Set<String> list = new TreeSet<String>();

    list.add("AB4_Test Eng_AA0XY11");
    list.add("AB4_Test Engineer_AA0XY11");

    search(list);
}
}

Answer 1

您可以查看Lucene Query syntax rules，了解如何强制搜索Test Engineer。

基本上，使用诸如

之类的查询

 AB4_Test AND Eng_AA0XY11

可以工作，但我不确定。上面链接指向的页面非常简洁，您将能够快速找到满足您需求的查询。

Answer 2

如果这两个结果（测试，测试工程师）具有相同的排名分数，那么您将按照它们出现的顺序看到它们。您应该尝试使用长度过滤器并提高条款，然后您可以提出解决方案。

另见： what is the best lucene setup for ranking exact matches as the highest

为什么Lucene不会根据全字匹配返回结果？

2 个答案: