Question

我正在使用lucene来索引维基百科中的边缘结构。例如。如果页面"Albert_Einstein"与"Theoretical_physics"之间存在链接文字为"theoretical physicist"的链接，那么我创建一个表示边缘的文档，如下所示，

doc.add(new StringField("src", "Albert_Einstein", Store.YES));
doc.add(new StringField("target", "Theoretical_physics", Store.YES));
doc.add(new StringField("edgeLabel", "theoretical physicist", Store.YES));

我使用Keyword analyser创建索引，并按如下方式查询target_title的传入链接，

public List<Document> searchTitleInLinks(String target_title) {
        List<Document> results = Lists.newArrayList();
    try {
        Query q = new TermQuery(new Term("target", target_title));
        for (ScoreDoc sc : searcher.search(q, 20).scoreDocs) {
            results.add(searcher.doc(sc.doc));
        }
    }

对于查询"United_States"，我应该获得以"United_States"为目标的所有边。但是，当我执行以下操作时，

        List<Document> docs = indexer.searchTitleInLinks("United_States");
        for(Document doc:docs)
        {
            System.out.println("TARGET "+doc.get("target"));
        }

我得到了莫名其妙的页面，例如。

TARGET United_States (this is fine)
TARGET U.S._state
TARGET American_English
TARGET United_States
TARGET Non-profit_organization
TARGET Viscosity
TARGET Group_(periodic_table)
TARGET Ocean
TARGET U.S._foreign_policy

我知道查询一词在某种程度上搞砸了，但在哪里？据我所知，这是使用术语查询的最简单方法，但它仍然以某种方式返回不良结果。有人可以提出解决方案吗？

Lucene术语查询无法正常工作

0 个答案: