Question

基本上我索引了85k个html文件（谷歌结果页面和关键词是不同的大学名称），并且我在每个lucene索引中使用每个页面的标题作为名为“title”的字段。当我搜索关键词如 “duquesne AND university” 时，没有结果出来，但是，当我将关键词更改为 “duquesne”< / em> ，我可以获得标题结果：“标题：Duquesne Univeristy - Google搜索” 为什么会这样？从第二次尝试我可以告诉这个标题为Duquesne Univeristy的文件被编入索引，但我无法从第一次尝试获得它。很多Thx！〜

以下是构建索引的代码，我使用Jsoup从网页获取标题：

//indexDir is the directory that hosts Lucene's index files File indexDir = new File("F:\\luceneIndex"); Directory myindex=SimpleFSDirectory.open(indexDir); //dataDir is the directory that hosts the text files that to be indexed File dataDir = new File("I:\\luceneTextFiles"); Analyzer luceneAnalyzer = new StandardAnalyzer(Version.LUCENE_CURRENT); File[] dataFiles = dataDir.listFiles(); IndexWriterConfig indexConfig=new IndexWriterConfig(Version.LUCENE_CURRENT,luceneAnalyzer); IndexWriter indexWriter = new IndexWriter(myindex, indexConfig); long startTime = new Date().getTime(); System.out.println("Total file number is "+dataFiles.length+""); for(int i = 0; i < dataFiles.length; i++){ if(dataFiles[i].isFile() && dataFiles[i].getName().endsWith(".txt")){ org.jsoup.nodes.Document t=Jsoup.parse(dataFiles[i], "UTF-8"); Document document = new Document(); Reader txtReader = new FileReader(dataFiles[i]); document.add(new Field("title",t.title(),Field.Store.YES,Field.Index.ANALYZED)); document.add(new Field("path",dataFiles[i].getCanonicalPath(),Field.Store.YES,Field.Index.NOT_ANALYZED)); document.add(new Field("count",i+"",Field.Store.YES,Field.Index.NOT_ANALYZED)); document.add(new Field("contents",txtReader)); indexWriter.addDocument(document); } } //indexWriter.getCommitData(); indexWriter.close(); long endTime = new Date().getTime(); String queryKey="duquesne"; String subqueryKey="university"; String queryField="contents"; String subqueryField="title"; /* * 0------>normal search * 1------>range search * 2------>prefix search * 3------>combine search * 4------>phrase query * 5------>wild card query * 6------>fuzzy query */ int querychoice=0; //initialize the directory File indexDir=new File("F:\\luceneIndex"); Directory directory=SimpleFSDirectory.open(indexDir); IndexReader reader=IndexReader.open(directory); //initialize the searcher IndexSearcher searcher=new IndexSearcher(reader); Analyzer analyzer=new StandardAnalyzer(Version.LUCENE_CURRENT); Query query; switch(querychoice){ case 0: QueryParser parser=new QueryParser(Version.LUCENE_CURRENT,subqueryField,analyzer); query=parser.parse(queryKey); break;

Answer 1

嗯，也许是因为university搜索关键字和Univeristy不是同一个词？或者你只是在你的问题中拼错了吗？

Answer 2

使用标准分析器解析title:Duquesne Univeristy - Google Search将导致查询title:duquesne defaultfield:univeristy defaultfield:google defaultfield:search，而条件是OR连接。

Lucene：文件存在，而canot使用QueryParser获取它

2 个答案: