我正在尝试运行全文搜索操作,例如to_tsvector
,to_tsquery
等,并且需要大约80多种语言的dictionaries。
Postgres似乎只提供16种语言配置,另外两种我正在测试中文(jiebacfg
和testzhcg
又名ZHParse
)。我正在寻找文档或其他语言的存储库。
mydatabase=# \dF
List of text search configurations
Schema | Name | Description
------------+------------+---------------------------------------
pg_catalog | danish | configuration for danish language
pg_catalog | dutch | configuration for dutch language
pg_catalog | english | configuration for english language
pg_catalog | finnish | configuration for finnish language
pg_catalog | french | configuration for french language
pg_catalog | german | configuration for german language
pg_catalog | hungarian | configuration for hungarian language
pg_catalog | italian | configuration for italian language
pg_catalog | norwegian | configuration for norwegian language
pg_catalog | portuguese | configuration for portuguese language
pg_catalog | romanian | configuration for romanian language
pg_catalog | russian | configuration for russian language
pg_catalog | simple | simple configuration
pg_catalog | spanish | configuration for spanish language
pg_catalog | swedish | configuration for swedish language
pg_catalog | turkish | configuration for turkish language
public | jiebacfg | configuration for jieba
public | testzhcfg |
(18 rows)
答案 0 :(得分:2)
如 pozs 所述,您可以从OpenOffice(或LibreOffice)扩展程序获取字典文件。来自documentation:
要创建Ispell字典,请执行以下步骤:
- 下载字典配置文件。 OpenOffice扩展文件具有.oxt扩展名。有必要提取.aff和.dic文件,将扩展名更改为.affix和.dict。对于某些字典文件,还需要使用命令将字符转换为UTF-8编码(例如,对于挪威语字典):
iconv -f ISO_8859-1 -t UTF-8 -o nn_no.affix nn_NO.aff
iconv -f ISO_8859-1 -t UTF-8 -o nn_no.dict nn_NO.dic
将文件复制到$ SHAREDIR / tsearch_data目录
使用以下命令将文件加载到PostgreSQL中:
创建文本搜索字典english_hunspell(
TEMPLATE = ispell,
DictFile = en_us,
AffFile = en_us,
停用词=英语);
还有一个扩展列表,提供简单的字典安装方式。您可以从github下载它们。