我需要帮助设置lucene拼写检查器的字符集(版本3.6核心lucene和拼写检查器)。我的词典(“D:\ dictionary.txt”)包含英语和俄语单词。 我的代码适用于英文文本。例如,它返回正确的拼写单词'hello'。但它不适用于俄语。例如,当我拼错一些俄语单词时,编译器引发异常(线程“main”中的异常java.lang.ArrayIndexOutOfBoundsException:0)它找不到任何有关俄语单词的建议。
这是我的代码:
RAMDirectory spellCheckerDir = new RAMDirectory();
SpellChecker spellChecker = new SpellChecker(spellCheckerDir);
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_36);
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_36, analyzer);
InputStreamReader isr = new InputStreamReader(new FileInputStream(new File("D:\\dictionary.txt")), "UTF-8");
PlainTextDictionary dictionary = new PlainTextDictionary(isr);
spellChecker.indexDictionary(dictionary, config, true);
suggestions = spellChecker.suggestSimilar("hwllo", 1); // word 'hello' is misspeled like 'hwllo'
答案 0 :(得分:0)
我可以根据你的代码提供的最佳选择(它很有用,10倍)。 我刚刚加载了两个字典,也可以在组合文件中工作。
RAMDirectory spellCheckerDir = new RAMDirectory();
SpellChecker spellChecker = new SpellChecker(spellCheckerDir);
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_44);
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_44, analyzer);
InputStreamReader isr = new InputStreamReader(new FileInputStream(new File("d:/dictionaries/English/words.english")), "UTF-8");
PlainTextDictionary dictionary = new PlainTextDictionary(isr);
spellChecker.indexDictionary(dictionary, config, true);
isr = new InputStreamReader(new FileInputStream(new File("d:/dictionaries/Swedish/words.swedish")), "UTF-8");
PlainTextDictionary swdictionary = new PlainTextDictionary(isr);
spellChecker.indexDictionary(swdictionary, config, true);
String wordForSuggestions = "hwllo";
int suggestionsNumber = 5;
String[] suggestions = spellChecker.suggestSimilar("hwllo", suggestionsNumber); // word 'hello' is misspeled like 'hwllo'