使用LUCENE 4.6和PDF Box搜索PDF文本的示例代码

时间:2014-01-09 12:17:56

标签: lucene pdfbox

Iam使用LUCENE 4.6搜索PDF格式的短语。我写了以下代码。但它在“Analyzer”和“QueryPhrase”行中抛出错误。请帮我这样做。

            Analyzer analyzer = new Analyzer(Version.LUCENE_44);

            // Store the index in memory:               
            Directory directory = new RAMDirectory();
            // To store an index on disk, use this instead:
            //Directory directory = FSDirectory.open("/tmp/testindex");
            IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_44, analyzer);
            IndexWriter iwriter = new IndexWriter(directory, config);
            Document doc = new Document();
            String text = "This is the text to be indexed.";
            doc.add(new Field("fieldname", text, TextField.TYPE_STORED));
            iwriter.addDocument(doc);
            iwriter.close();

            // Now search the index
            DirectoryReader ireader = DirectoryReader.open(directory);
            IndexSearcher isearcher = new IndexSearcher(ireader);
            // Parse a simple query that searches for "text":
            QueryParser parser = new QueryParser(Version.LUCENE_44, "fieldname", analyzer);
            Query query = parser.parse("text");
            ScoreDoc[] hits = isearcher.search(query, null, 1000).scoreDocs;
            // Iterate through the results:
            if(hits.length>0){
                System.out.println("Searched text existed in the PDF.");
            }
            ireader.close();
            directory.close();
         }
         catch(Exception e){
             System.out.println("Exception: "+e.getMessage());
         }
 }

1 个答案:

答案 0 :(得分:1)

您无法实例化抽象类Analyzer。相反,你可以写一些像:

Analyzer analyzer = new EnglishAnalyzer(Version.LUCENE_44);