Question

我想在文件中搜索查询＆＃34; fdictionary.txt＆＃34;包含逐行写入的单词列表（230,000个单词）。有什么建议为什么这段代码不起作用？拼写检查部分正在工作，并给我建议列表（我将列表的长度限制为1）。我想要做的是搜索该字典，如果该字已经存在，请不要调用拼写检查。我的搜索功能不起作用。它不会给我错误！这是我实施的内容：

public class SpellCorrection {

public static File indexDir = new File("/../idxDir");

public static void main(String[] args) throws IOException, FileNotFoundException, CorruptIndexException, ParseException {

    Directory directory = FSDirectory.open(indexDir);
    SpellChecker spell = new SpellChecker(directory);

    IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_20, null);
    File dictionary = new File("/../fdictionary00.txt");
    spell.indexDictionary(new PlainTextDictionary(dictionary), config, true);


    String query = "red"; //kne, console
    String correctedQuery = query; //kne, console

    if (!search(directory, query)) {
        String[] suggestions = spell.suggestSimilar(query, 1);
        if (suggestions != null) {correctedQuery=suggestions[0];}
    }

    System.out.println("The Query was: "+query);
    System.out.println("The Corrected Query is: "+correctedQuery);
}

public static boolean search(Directory directory, String queryTerm) throws FileNotFoundException, CorruptIndexException, IOException, ParseException {
    boolean isIn = false;

    IndexReader indexReader = IndexReader.open(directory);
    IndexSearcher indexSearcher = new IndexSearcher(indexReader);
    Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_20);

    Term term = new Term(queryTerm);
    Query termQuery = new TermQuery(term);
    TopDocs hits = indexSearcher.search(termQuery, 100);
    System.out.println(hits.totalHits);


    if (hits.totalHits > 0) {
        isIn = true;
    }
    return isIn;
}
}

Answer 1

您从哪里索引fdictionary00.txt中的内容？

只有拥有索引时，才能使用IndexSearcher进行搜索。如果您是lucene的新手，您可能需要查看一些快速教程。（如http://lucenetutorial.com/lucene-in-5-minutes.html）

Answer 2

您从未构建索引。

您需要设置索引...

Directory directory = FSDirectory.open(indexDir);
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_20);
IndexWriter writer = new IndexWriter(directory,analyzer,true,IndexWriter.MaxFieldLength.UNLIMITED );

然后，您需要创建一个文档，并将每个术语作为分析字段添加到文档中。

 Document doc = new Document();
 doc.Add(new Field("name", word , Field.Store.YES, Field.Index.ANALYZED));

然后将文档添加到索引

writer.AddDocument(doc);

writer.Optimize();

现在构建索引并关闭索引编写器。

writer.Commit();
writer.Close();

Answer 3

您可以在服务中提供SpellChecker个实例，并使用spellChecker.exist(word)。

请注意，SpellChecker不会将字词索引为2个字符或更少。要解决此问题，您可以在创建索引后将它们添加到索引中（将它们添加到SpellChecker.F_WORD字段中）。

如果您要添加到实时索引并将其用于exist(word)，则需要将其添加到SpellChecker.F_WORD字段。当然，因为你没有添加到所有其他字段，例如gram / start / end等，所以你的单词不会出现在其他拼写错误的单词的建议中。

在这种情况下，您必须将该单词添加到您的文件中，以便在重新创建索引时，它将作为建议提供。如果项目使SpellChecker.createDocument(...)公开/受保护而非私有，那将是很好的，因为这种方法通过添加单词来完成所有事情。

毕竟，您需要致电spellChecker.setSpellIndex(directory)。

如何用lucene搜索文件

3 个答案: