我想在pdf中搜索文本阶段,例如"劳动法"。但结果是,它返回包含单词" Labor"和"法律"。请帮助检查下面的鳕鱼:
EnglishAnalyzer analyzer = new EnglishAnalyzer();
analyzer.setVersion(Version.LATEST);
QueryParser parser = new QueryParser("content", analyzer);
Query query = parser.parse("Labor Law");
Directory indexDirectory = FSDirectory.open(new File(indexLucencePath));
DirectoryReader dirReader = DirectoryReader.open(indexDirectory);
indexSearcher = new IndexSearcher(dirReader);
ScoreDoc[] queryResults = indexSearcher.search(query, numOfResults).scoreDocs;
List<IndexItem> results = new ArrayList<IndexItem>();
for (ScoreDoc scoreDoc : queryResults) {
Document doc = indexSearcher.doc(scoreDoc.doc);
results.add(new IndexItem(doc.get(IndexItem.TITLE), doc.get(IndexItem.CONTENT)));
}
答案 0 :(得分:2)
尝试
短语查询:
Query query = parser.parse("\"Labor Law\"");
所有条款必须存在
Query query = parser.parse("+Labor +Law");
您也可以像这样自己创建查询
BooleanQuery query= new BooleanQuery();
TermQuery clause1 = new TermQuery(new Term("content", "Labor"));
TermQuery clause2 = new TermQuery(new Term("content", "Law"));
query.add(new BooleanClause(clause1, BooleanClause.Occur.MUST));
query.add(new BooleanClause(clause1, BooleanClause.Occur.MUST));
答案 1 :(得分:1)
有不同类型的分析仪可供选择,请根据您的要求与不同的分析仪联系。 Comparison of Lucene Analyzers。这也可以帮助Lucene: Multi-word phrases as search terms