如何在lucene 4.10中搜索全文

时间:2014-12-24 09:07:17

标签: java lucene

我想在pdf中搜索文本阶段,例如"劳动法"。但结果是,它返回包含单词" Labor"和"法律"。请帮助检查下面的鳕鱼:

EnglishAnalyzer analyzer = new EnglishAnalyzer();
analyzer.setVersion(Version.LATEST);          

QueryParser parser = new QueryParser("content", analyzer);
Query query = parser.parse("Labor Law");

Directory indexDirectory = FSDirectory.open(new File(indexLucencePath));
DirectoryReader dirReader = DirectoryReader.open(indexDirectory);
indexSearcher = new IndexSearcher(dirReader);

ScoreDoc[] queryResults = indexSearcher.search(query, numOfResults).scoreDocs;

List<IndexItem> results = new ArrayList<IndexItem>();
for (ScoreDoc scoreDoc : queryResults) {
    Document doc = indexSearcher.doc(scoreDoc.doc);
    results.add(new IndexItem(doc.get(IndexItem.TITLE), doc.get(IndexItem.CONTENT)));
  }

2 个答案:

答案 0 :(得分:2)

尝试

短语查询:

Query query = parser.parse("\"Labor Law\"");

所有条款必须存在

Query query = parser.parse("+Labor +Law");

您也可以像这样自己创建查询

BooleanQuery query= new BooleanQuery();
TermQuery clause1 = new TermQuery(new Term("content", "Labor"));
TermQuery clause2 = new TermQuery(new Term("content", "Law"));
query.add(new BooleanClause(clause1, BooleanClause.Occur.MUST));
query.add(new BooleanClause(clause1, BooleanClause.Occur.MUST));

答案 1 :(得分:1)

有不同类型的分析仪可供选择,请根据您的要求与不同的分析仪联系。 Comparison of Lucene Analyzers。这也可以帮助Lucene: Multi-word phrases as search terms