自定义postgres FTS字典

时间:2015-01-16 21:27:41

标签: postgresql full-text-search

我们已经构建了一些搜索会计数据(订单,客户等)的软件,这些数据通常具有用前导零填充的用户可见数字。用户希望使用全文搜索,但不必键入所有前导零。例如:搜索" 12345"匹配" 0000012345"

在我看来,最优雅的解决方案是在uint令牌上运行的自定义词典。不幸的是,我很难找到有关编写词法分析器函数的任何文档。理想情况下,我想在SQL或pl / SQL中编写这样的函数,而不是必须求助于维护C扩展。

1 个答案:

答案 0 :(得分:3)

您可以创建自己的字典模板。为此,您需要创建文件:zero_dict.c,zero_dict.sql.in,Makefile。并将它们复制到目录“contrib / zero_dict”。

文件zero_dict.c:

#include "postgres.h"
#include "fmgr.h"
#include "tsearch/ts_public.h"

#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif

PG_FUNCTION_INFO_V1(dzero_init);
Datum dzero_init(PG_FUNCTION_ARGS);

Datum
dzero_init(PG_FUNCTION_ARGS)
{
    PG_RETURN_POINTER(NULL);
}

PG_FUNCTION_INFO_V1(dzero_lexize);
Datum dzero_lexize(PG_FUNCTION_ARGS);

Datum
dzero_lexize(PG_FUNCTION_ARGS)
{
    char       *in = (char *) PG_GETARG_POINTER(1);
    int32       len = PG_GETARG_INT32(2);
    char       *txt;
    TSLexeme   *res;
    int         n;

    if ((n = strspn(in, "0")) != 0 && in[n] != '\0')
    {
        txt = pnstrdup(in + n, len - n);
        res = palloc0(sizeof(TSLexeme) * 2);
        res[0].lexeme = txt;
        PG_RETURN_POINTER(res);
    }
    else
    {
        PG_RETURN_POINTER(NULL);
    }
}

文件zero_dict.sql.in:

SET search_path = public;
BEGIN;

CREATE OR REPLACE FUNCTION dzero_init(internal)
     returns internal
     as 'MODULE_PATHNAME'
     language C;

CREATE OR REPLACE FUNCTION dzero_lexize(internal,internal,internal,internal)
    returns internal
    as 'MODULE_PATHNAME'
    language C
    with (isstrict);

CREATE TEXT SEARCH TEMPLATE zerodict(
    LEXIZE = dzero_lexize,
    INIT = dzero_init);

END;

文件Makefile:

subdir = contrib/zero_dict
top_builddir = ../..
include $(top_builddir)/src/Makefile.global

MODULE_big = zero_dict
OBJS =  zero_dict.o
DATA_built = zero_dict.sql
DOCS =

include $(top_srcdir)/contrib/contrib-global.mk

然后您需要执行以下命令:

make
make install
psql DBName < zero_dict.sql

如果要创建字典:

create text search dictionary zerodict (template=zerodict);

您可以执行查询:

dicts=# select ts_lexize('zerodict', '0000012345');
 ts_lexize 
-----------
 {12345}
(1 row)

有关详细信息,请查看: http://www.sai.msu.su/~megera/postgres/fts/fts.pdf