Postgres 9.6.1全文搜索大多数口语的词典

时间:2017-01-18 09:27:22

标签: postgresql dictionary full-text-search

我正在尝试运行全文搜索操作,例如to_tsvectorto_tsquery等,并且需要大约80多种语言的dictionaries

Postgres似乎只提供16种语言配置,另外两种我正在测试中文(jiebacfgtestzhcg又名ZHParse)。我正在寻找文档或其他语言的存储库。

mydatabase=# \dF

               List of text search configurations
   Schema   |    Name    |              Description              
------------+------------+---------------------------------------
 pg_catalog | danish     | configuration for danish language
 pg_catalog | dutch      | configuration for dutch language
 pg_catalog | english    | configuration for english language
 pg_catalog | finnish    | configuration for finnish language
 pg_catalog | french     | configuration for french language
 pg_catalog | german     | configuration for german language
 pg_catalog | hungarian  | configuration for hungarian language
 pg_catalog | italian    | configuration for italian language
 pg_catalog | norwegian  | configuration for norwegian language
 pg_catalog | portuguese | configuration for portuguese language
 pg_catalog | romanian   | configuration for romanian language
 pg_catalog | russian    | configuration for russian language
 pg_catalog | simple     | simple configuration
 pg_catalog | spanish    | configuration for spanish language
 pg_catalog | swedish    | configuration for swedish language
 pg_catalog | turkish    | configuration for turkish language
 public     | jiebacfg   | configuration for jieba
 public     | testzhcfg  | 
(18 rows)

1 个答案:

答案 0 :(得分:2)

pozs 所述,您可以从OpenOffice(或LibreOffice)扩展程序获取字典文件。来自documentation

  

要创建Ispell字典,请执行以下步骤:

     
      
  • 下载字典配置文件。 OpenOffice扩展文件具有.oxt扩展名。有必要提取.aff和.dic文件,将扩展名更改为.affix和.dict。对于某些字典文件,还需要使用命令将字符转换为UTF-8编码(例如,对于挪威语字典):
  •   
     

iconv -f ISO_8859-1 -t UTF-8 -o nn_no.affix nn_NO.aff
  iconv -f ISO_8859-1 -t UTF-8 -o nn_no.dict nn_NO.dic

     
      
  • 将文件复制到$ SHAREDIR / tsearch_data目录

  •   
  • 使用以下命令将文件加载到PostgreSQL中:

  •   
     

创建文本搜索字典english_hunspell(
      TEMPLATE = ispell,
      DictFile = en_us,
      AffFile = en_us,
      停用词=英语);

还有一个扩展列表,提供简单的字典安装方式。您可以从github下载它们。