用户输入可以使用英语或意大利语。数据既有英语,也有(大部分)意大利语。以下是我的查询(似乎正在运行),我的问题是这是否是处理未知语言输入的正确方法。 (在示例中,用户输入单词“葡萄酒”):
SELECT id, name
FROM (
SELECT p.id, p.name,
to_tsvector('italian', p.name) || --some data are only in italian
to_tsvector('italian', cat.category) ||
to_tsvector((CASE WHEN de.language = 'ITA' THEN 'italian' ELSE 'english' END)::regconfig, coalesce(string_agg(de.descr, ' '))) as document
FROM myschema.product p
INNER JOIN myschema.disc d ON d.id_disc = p.id_disc
INNER JOIN myschema.disc_city dc ON dc.id_disc = d.id_disc
INNER JOIN myschema.city c ON c.id_city = dc.id_city
INNER JOIN myschema.category cat ON cat.id_category = d.id_category
INNER JOIN myschema.product_desc pd ON pd.id = p.id --One p.id to Many pd.id, a product can have multiple descriptions
INNER JOIN myschema.descr de ON de.id_descr = pd.id_descr
GROUP BY p.id, p.name, cat.category, de.language
) p_search
--handling input 'wine' of unknown language (could be too the italian 'vino')
WHERE p_search.document @@ to_tsquery('italian', 'wine') OR
p_search.document @@ to_tsquery('english', 'wine');
GROUP BY id, name
答案 0 :(得分:0)
您可以使用“简单”字典进行测试:
SELECT to_tsvector('english', 'The wine is good');
SELECT to_tsvector('italian', 'The wine is good');
SELECT to_tsvector('simple', 'The wine is good');
SELECT to_tsvector('english', 'Il vino è buono');
SELECT to_tsvector('italian', 'Il vino è buono');
SELECT to_tsvector('simple', 'Il vino è buono');
答案 1 :(得分:0)
使用PostgreSQL,您可以创建自己的字典:
CREATE TEXT SEARCH DICTIONARY public.wine_dict (
TEMPLATE = pg_catalog.simple,
STOPWORDS = wine
);
文件wine.stop包含字典的停用词:
wine
merlot
carmenere
...
此文件必须位于$ SHAREDIR / tsearch_data / wine.stop中 使用pg_config --sharedir查找$ SHAREDIR
然后创建SEARCH DICTIONARY:
CREATE TEXT SEARCH DICTIONARY public.wine_dict (
TEMPLATE = pg_catalog.simple,
STOPWORDS = wine
);
CREATE TEXT SEARCH CONFIGURATION wine_dict(parser = default);
ALTER TEXT SEARCH CONFIGURATION wine_dict
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
word, hword, hword_part
WITH wine_dict;
SELECT to_tsvector('wine_dict', 'The wine is good');
result:
'good':4 'is':3 'the':1