我正在尝试使用Lucene 2.9.4编写一个简单的程序,它搜索一个短语查询,但我得到0次点击
public class HelloLucene {
public static void main(String[] args) throws IOException, ParseException{
// TODO Auto-generated method stub
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_29);
Directory index = new RAMDirectory();
IndexWriter w = new IndexWriter(index,analyzer,true,IndexWriter.MaxFieldLength.UNLIMITED);
addDoc(w, "Lucene in Action");
addDoc(w, "Lucene for Dummies");
addDoc(w, "Managing Gigabytes");
addDoc(w, "The Art of Computer Science");
w.close();
PhraseQuery pq = new PhraseQuery();
pq.add(new Term("content", "lucene"),0);
pq.add(new Term("content", "in"),1);
pq.setSlop(0);
int hitsPerPage = 10;
IndexSearcher searcher = new IndexSearcher(index,true);
TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
searcher.search(pq, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;
System.out.println("Found " + hits.length + " hits.");
for(int i=0; i<hits.length; i++){
int docId = hits[i].doc;
Document d = searcher.doc(docId);
System.out.println((i+1)+ "." + d.get("content"));
}
searcher.close();
}
public static void addDoc(IndexWriter w, String value)throws IOException{
Document doc = new Document();
doc.add(new Field("content", value, Field.Store.YES, Field.Index.NOT_ANALYZED));
w.addDocument(doc);
}
}
请告诉我有什么问题。我也尝试过如下使用QueryParser
String querystr ="\"Lucene in Action\"";
Query q = new QueryParser(Version.LUCENE_29, "content",analyzer).parse(querystr);
但这也行不通。
答案 0 :(得分:4)
代码存在两个问题(它们与您的Lucene版本无关):
1)StandardAnalyzer不会对停用词(如“in”)进行索引,因此PhraseQuery永远无法找到短语“Lucene in”
2)如Xodarap和Shashikant Kore所述,您创建文档的调用需要包含Index.ANALYZED,否则Lucene不会在文档的这一部分使用Analyzer。使用Index.NOT_ANALYZED可能有一种很好的方法,但我不熟悉它。
要轻松修复,请将addDoc方法更改为:
public static void addDoc(IndexWriter w, String value)throws IOException{
Document doc = new Document();
doc.add(new Field("content", value, Field.Store.YES, Field.Index.ANALYZED));
w.addDocument(doc);
}
并将您创建的PhraseQuery修改为:
PhraseQuery pq = new PhraseQuery();
pq.add(new Term("content", "computer"),0);
pq.add(new Term("content", "science"),1);
pq.setSlop(0);
这将为您提供以下结果,因为“计算机”和“科学”都不是停用词:
Found 1 hits.
1.The Art of Computer Science
如果你想找到“Lucene in Action”,你可以增加这个PhraseQuery的斜率(增加两个单词之间的“差距”):
PhraseQuery pq = new PhraseQuery();
pq.add(new Term("content", "lucene"),0);
pq.add(new Term("content", "action"),1);
pq.setSlop(1);
如果你真的想搜索“lucene in”这个句子,你需要选择一个不同的分析器(比如SimpleAnalyzer)。在Lucene 2.9中,只需将您对StandardAnalyzer的调用替换为:
SimpleAnalyzer analyzer = new SimpleAnalyzer();
或者,如果您使用的是3.1或更高版本,则需要添加版本信息:
SimpleAnalyzer analyzer = new SimpleAnalyzer(Version.LUCENE_35);
以下是关于类似问题的有用帖子(这将有助于您开始使用PhraseQuery): Exact Phrase search using Lucene? - 请参阅WhiteFang34的回答。
答案 1 :(得分:1)
需要分析该字段以及需要启用术语向量。
doc.add(new Field("content", value, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.YES));
如果您不打算从中检索该字段,则可以禁用存储 索引。
答案 2 :(得分:0)
这是我使用Lucene Version.LUCENE_35的解决方案。它也被称为http://lucene.apache.org/java/docs/releases.html的Lucene 3.5.0。如果您使用的是Eclipse之类的IDE,则可以将.jar文件添加到构建路径,这是指向3.5.0.jar文件的直接链接:http://repo1.maven.org/maven2/org/apache/lucene/lucene-core/3.5.0/lucene-core-3.5.0.jar。
当新版本的Lucene问世时,如果您继续使用3.5.0.jar,此解决方案仍然适用。
现在代码:
import java.io.IOException;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.util.Version;
public class Index {
public static void main(String[] args) throws IOException, ParseException {
// To store the Lucene index in RAM
Directory directory = new RAMDirectory();
// To store the Lucene index in your harddisk, you can use:
//Directory directory = FSDirectory.open("/foo/bar/testindex");
// Set the analyzer that you want to use for the task.
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_35);
// Creating Lucene Index; note, the new version demands configurations.
IndexWriterConfig config = new IndexWriterConfig(
Version.LUCENE_35, analyzer);
IndexWriter writer = new IndexWriter(directory, config);
// Note: There are other ways of initializing the IndexWriter.
// (see http://lucene.apache.org/java/3_5_0/api/all/org/apache/lucene/index/IndexWriter.html)
// The new version of Documents.add in Lucene requires a Field argument,
// and there are a few ways of calling the Field constructor.
// (see http://lucene.apache.org/java/3_5_0/api/core/org/apache/lucene/document/Field.html)
// Here I just use one of the Field constructor that takes a String parameter.
List<Document> docs = new ArrayList<Document>();
Document doc1 = new Document();
doc1.add(new Field("content", "Lucene in Action",
Field.Store.YES, Field.Index.ANALYZED));
Document doc2 = new Document();
doc2.add(new Field("content", "Lucene for Dummies",
Field.Store.YES, Field.Index.ANALYZED));
Document doc3 = new Document();
doc3.add(new Field("content", "Managing Gigabytes",
Field.Store.YES, Field.Index.ANALYZED));
Document doc4 = new Document();
doc4.add(new Field("content", "The Art of Lucene",
Field.Store.YES, Field.Index.ANALYZED));
docs.add(doc1); docs.add(doc2); docs.add(doc3); docs.add(doc4);
writer.addDocuments(docs);
writer.close();
// To enable query/search, we need to initialize
// the IndexReader and IndexSearcher.
// Note: The IndexSearcher in Lucene 3.5.0 takes an IndexReader parameter
// instead of a Directory parameter.
IndexReader iRead = IndexReader.open(directory);
IndexSearcher iSearch = new IndexSearcher(iRead);
// Parse a simple query that searches for the word "lucene".
// Note: you need to specify the fieldname for the query
// (in our case it is "content").
QueryParser parser = new QueryParser(Version.LUCENE_35, "content", analyzer);
Query query = parser.parse("lucene in");
// Search the Index with the Query, with max 1000 results
ScoreDoc[] hits = iSearch.search(query, 1000).scoreDocs;
// Iterate through the search results
for (int i=0; i<hits.length;i++) {
// From the indexSearch, we retrieve the search result individually
Document hitDoc = iSearch.doc(hits[i].doc);
// Specify the Field type of the retrieved document that you want to print.
// In our case we only have 1 Field i.e. "content".
System.out.println(hitDoc.get("content"));
}
iSearch.close(); iRead.close(); directory.close();
}
}