在LanguageTool中,如何创建字典并将其用于拼写检查?

时间:2016-05-24 11:23:13

标签: java dictionary spell-checking languagetool

如何使用语言工具创建拼写检查字典?我不是Java程序员,这是我第一次看到LT。

2 个答案:

答案 0 :(得分:3)

您好,这是我使用语言工具创建拼写检查词典的经验!希望你喜欢它。

第1部分:如何创建词典

你需要:

•包含

字典的.txt文件

•.info文件,指定有关如何设置LT输出文件的信息(它已存在于LT目录中)。

•LanguageTool独立版

•Java 8

在本节结束时,您将拥有:

•.dict文件,即包含您的字典的文件,以LT的可读形式

  1. 安装最新版本的LT:https://languagetool.org/download/snapshots/?C=M;O=D
  2. 请确保您的.txt格式正确(a)和编码(b): 一个。 1个字par线 湾UTF8编码
  3. 在命令行中写: 一个。 java -cp languagetool.jar org.languagetool.tools.SpellDictionaryBuilder fr_FR -i 字典文件的路径 -info .info文件的路径 -o 路径输出文件
  4. 其中:

    我。 fr_FR是与字典语言相关的代码

    II。 -i它是输入文件的参数(.txt)

    III。 -info它是与字典相关的.info文件的参数。您可以按照这些说明创建它(http://wiki.languagetool.org/hunspell-support - “配置字典”部分)或使用.info中已存在的.info(如果存在) - \ org \ languagetool \ resource \ yourlanguage

    IV。 -o它是用于指定您希望保存.dict输出文件的位置的参数

    第2部分:如何将字典集成在LT上以进行拼写检查

    你需要:

    •JDK 1.8(http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

    •Maven(https://maven.apache.org/download.cgi

    •Java for Java(JetBrains,Eclipse等)

    •。info文件+ .dict文件(参见第1部分)

    •GitHub LanguageTool项目(https://github.com/languagetool-org/languagetool

    1. 设置JDK和Maven bin路径(更多信息:https://maven.apache.org/install.html
    2. 将part1上创建的.info和.dict文件复制到\ languagetool-master \ languagetool-language-modules \ YourLanguage \ src \ main \ resources \ org \ languagetool \ resource \ YourLanguage \ hunspell
    3. 使用IDE打开称为字典语言的java文件(例如French.java):
    4. 一个。将YourLanguage.java中的HunspellNoSuggestionRule更改为MorfologikYourLanguageSpellerRule

       @Override
        public List<Rule> getRelevantRules(ResourceBundle messages) throws IOException {
          return Arrays.asList(
      new CommaWhitespaceRule(messages),
      new DoublePunctuationRule(messages),
      new GenericUnpairedBracketsRule(messages,
      Arrays.asList("[", "(", "{" /*"«", "‘"*/),
      Arrays.asList("]", ")", "}"
      /*"»", French dialog can contain multiple sentences. */
      /*"’" used in "d’arm" and many other words */)),
      new MorfologikYourLanguageSpellerRule(messages, this),
      new UppercaseSentenceStartRule(messages, this),
      new MultipleWhitespaceRule(messages, this),
      new SentenceWhitespaceRule(messages),
      // specific to French:
      new CompoundRule(messages),
      new QuestionWhitespaceRule(messages)
      );
      }
      

      湾在\ languagetool-master \ languagetool-language-modules \ YourLanguage \ src \ main \ java \ org \ languagetool \ rules \ YourLanguage:

      中创建新的MorfologikYourLanguageSpellerRule.java
      /* LanguageTool, a natural language style checker
       * Copyright (C) 2012 Marcin Miłkowski (http://www.languagetool.org)
       *
       * This library is free software; you can redistribute it and/or
       * modify it under the terms of the GNU Lesser General Public
       * License as published by the Free Software Foundation; either
       * version 2.1 of the License, or (at your option) any later version.
       *
       * This library is distributed in the hope that it will be useful,
       * but WITHOUT ANY WARRANTY; without even the implied warranty of
       * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
       * Lesser General Public License for more details.
       *
       * You should have received a copy of the GNU Lesser General Public
       * License along with this library; if not, write to the Free Software
       * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301
       * USA
       */
      
      package org.languagetool.rules.fr;
      
      import java.io.IOException;
      import java.util.ResourceBundle;
      
      import org.languagetool.Language;
      import org.languagetool.rules.spelling.morfologik.MorfologikSpellerRule;
      
      public final class MorfologikYourLanguageSpellerRule extends MorfologikSpellerRule {
      
          public static final String RULE_ID = "MORFOLOGIK_RULE_CODEOFYOURLANGUAGE"; /* for ex. Fr_FR for French */
      
          private static final String RESOURCE_FILENAME = "PATH TO YOUR .DICT FILE";
      
          public MorfologikFrenchSpellerRule(ResourceBundle messages,
                                            Language language) throws IOException {
          super(messages, language);
        }
      
          @Override
          public String getFileName() {
              return RESOURCE_FILENAME;
          }
      
          @Override
          public String getId() {
              return RULE_ID;
          }
      }
      

      ℃。使用命令行转到\ languagetool-master \并写入:Mvn package

      d。在\ languagetool-master \ languagetool-standalone \ target \ LanguageTool-3.4-SNAPSHOT \ LanguageTool-3.4-SNAPSHOT中查看结果。

答案 1 :(得分:0)

作为一种替代解决方案,我创建了一个GUI程序,以使其更易于执行@KeyPi回答。您可以找到它here