Lucene搜索没有给我适当的结果

时间:2019-12-10 02:48:51

标签: java lucene

我有一个项目在使用Lucene搜索功能。在项目中,我有一个源文件夹,并给了一个将在其中创建索引的目标文件夹。在源文件夹中,我有多个文件夹,在文件夹中,我有多个html文件。在这里,我正在html页面(html内容)中进行通配符搜索。最初,搜索是根据命中找到文件路径,然后从该页面中找到合适的搜索结果。

现在我的问题是,搜索是在找到匹配结果的位置正确找到文件路径,但是当获取结果内容时,其返回空白值。

请在下面找到用于创建索引和搜索功能的代码段。

public class IndexCode 
{
    public static void main(String[] args)
    {
        String docsPath = "Souce";
        String indexPath = "target";
        final Path docDir = Paths.get(docsPath);
        try
        {
            Directory dir = FSDirectory.open( Paths.get(indexPath) );
            Analyzer analyzer = new StandardAnalyzer();
            IndexWriterConfig iwc = new IndexWriterConfig(analyzer);
            iwc.setOpenMode(OpenMode.CREATE_OR_APPEND);
            IndexWriter writer = new IndexWriter(dir, iwc);
            indexDocs(writer, docDir);
            writer.close();
        } 
        catch (IOException e) 
        {
            e.printStackTrace();
        }
    }

    static void indexDocs(final IndexWriter writer, Path path) throws IOException 
    {
        if (Files.isDirectory(path)) 
        {
            Files.walkFileTree(path, new SimpleFileVisitor<Path>() 
            {
                public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException 
                {
                    try
                    {
                        indexDoc(writer, file, attrs.lastModifiedTime().toMillis());
                    } 
                    catch (IOException ioe) 
                    {
                        ioe.printStackTrace();
                    }
                    return FileVisitResult.CONTINUE;
                }
            });
        } 
        else
        {
            indexDoc(writer, path, Files.getLastModifiedTime(path).toMillis());
        }
    }

    static void indexDoc(IndexWriter writer, Path file, long lastModified) throws IOException 
    {
        try (InputStream stream = Files.newInputStream(file)) 
        {
            Document doc = new Document();
            doc.add(new StringField("path", file.toString(), Field.Store.YES));
            doc.add(new LongPoint("modified", lastModified));
            doc.add(new TextField("contents", new String(Files.readAllBytes(file)), Store.YES));

            writer.updateDocument(new Term("path", file.toString()), doc);
        }
    }
}

现在是否要在通配符中搜索“纽约”,以下是搜索功能代码

public class SearchCode 
{
    private static final String TRAGET = "target";

    public static void main(String[] args) throws Exception 
    {
        Directory dir = FSDirectory.open(Paths.get(TRAGET));
        IndexReader reader = DirectoryReader.open(dir);
        IndexSearcher searcher = new IndexSearcher(reader);
        Analyzer analyzer = new StandardAnalyzer();

        org.apache.lucene.queryparser.surround.parser.QueryParser surroundparser = new org.apache.lucene.queryparser.surround.parser.QueryParser();
        SrndQuery srndquery = surroundparser.parse("W(new*, del*)");
        query = srndquery.makeLuceneQueryField("contents", new BasicQueryFactory());

        TopDocs hits = searcher.search(query, 10, Sort.INDEXORDERED);
        Formatter formatter = new SimpleHTMLFormatter();
        QueryScorer scorer = new QueryScorer(query);
        Highlighter highlighter = new Highlighter(formatter, scorer);
        Fragmenter fragmenter = new SimpleSpanFragmenter(scorer, 10);
        highlighter.setTextFragmenter(fragmenter);
        for (int i = 0; i < hits.scoreDocs.length; i++) 
        {
            int docid = hits.scoreDocs[i].doc;
            Document doc = searcher.doc(docid);
            String title = doc.get("path");
            System.out.println("Path " + " : " + title);
            String text = doc.get("contents");
            TokenStream stream = TokenSources.getAnyTokenStream(reader, docid, "contents", analyzer);
            String[] frags = highlighter.getBestFragments(stream, text, 10);
            for (String frag : frags) 
            {
                System.out.println("=======================");
                System.out.println(frag);
            }
        }
        dir.close();
    }
}

此代码适用于纯文本内容,但是当我用于在我有css代码,js代码,html标签和内容的html页面中进行搜索时,有时“ frag”返回空白值(页面中有匹配项)

请帮助我解决问题,并让我知道是否需要其他详细信息。

先谢谢。

0 个答案:

没有答案