Lucene的IndexSearcher总是返回0 totalHits

时间:2014-06-26 12:52:58

标签: lucene

Lucene中IndexSearcher的搜索方法没有返回任何输出。查询返回的文档数始终为0.我使用以下代码构建了索引:

    void buildIndex(File indexDir, File trainDir, HashMap<String,Integer> dictionary) 
            throws IOException, FileNotFoundException {

            Directory fsDir = FSDirectory.open(indexDir);
            IndexWriterConfig iwConf 
                = new IndexWriterConfig(VERSION,mAnalyzer);

            iwConf.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
            IndexWriter indexWriter
                = new IndexWriter(fsDir,iwConf);

            File file = trainDir;
            String csvFilename = "/home/serene/Downloads/IndustryClassification/Train/Training.csv";
            CSVReader csvReader = new CSVReader(new FileReader(csvFilename),'\t');
            String[] row = null;
            while((row = csvReader.readNext()) != null) {
                Document d = new Document();
                String companyname = row[1];
                String NAICSID = row[2];
                //System.out.println(NAICSID);
                String description = row[4];
                d.add(new TextField("company",companyname,Store.YES));
                d.add(new StringField("category",NAICSID,Store.YES));
                dictionary.put(NAICSID, 1);
                d.add(new TextField("description", description, Store.NO));
                //System.out.println(d.toString());
                indexWriter.addDocument(d);
            }
            csvReader.close();
            int numDocs = indexWriter.numDocs();
            indexWriter.forceMerge(1);
            indexWriter.commit();
            indexWriter.close();
            System.out.println("index=" + indexDir.getName());
            System.out.println("num docs=" + numDocs);
        }

当尝试使用以下代码获取测试查询的输出时,我没有获得类别的任何输出,因为scoreDocs.length总是0并且for循环中的代码不会被执行。

    void testIndex(File indexDir, File testDir, Set<String>NEWSGROUPS)
            throws IOException, FileNotFoundException, ParseException {
            Directory fsDir = FSDirectory.open(indexDir);
            DirectoryReader reader = DirectoryReader.open(fsDir);
            IndexSearcher searcher = new IndexSearcher(reader);
            Analyzer analyzer = new StandardAnalyzer(VERSION);
            System.out.print("inside testIndex");
            int[][] confusionMatrix
                = new int[NEWSGROUPS.size()][NEWSGROUPS.size()];
            String csvFilename = "/home/serene/Downloads/IndustryClassification/Test/Test.csv";
            CSVReader csvReader = new CSVReader(new FileReader(csvFilename), '\t');
            String[] row = null;
            while((row = csvReader.readNext()) != null) {
                String companyname = row[1];
                String NAICSID = row[2];
                String description = row[4];

                Query query = new QueryParser(Version.LUCENE_44,"contents",analyzer).parse(QueryParser.escape(description));

                System.out.print(query +"\n");
                TopDocs hits = searcher.search(query,3);
                ScoreDoc[] scoreDocs = hits.scoreDocs;
                System.out.println(hits.totalHits);
                for (int n = 0; n < scoreDocs.length; ++n) {
                    ScoreDoc sd = scoreDocs[n];
                    int docId = sd.doc;
                    Document d = searcher.doc(docId);
                    String category = d.get("category");
                    System.out.println(category);
                }
            }
            csvReader.close();   
    }

1 个答案:

答案 0 :(得分:0)

替换&#34;内容&#34;与您索引的任何字段(公司..)一起使用。