Question

我希望从PostgreSQL中的text列创建n-gram。我目前在文本列中将数据（句子）上的数据（句子）拆分为数组。

enter code here从tableName中选择regexp_split_to_array（sentenceData，E'\ s +'）

一旦我有了这个数组，我该如何解决：

创建循环以查找n-gram，并将每个循环写入另一个表中的行

使用unexst我可以在不同的行上获取所有数组的所有元素，也许我可以想到一种从单个列中获取n-gram的方法，但是我会松开句子边界，这是我明智的保留

PostgreSQL的示例SQL代码，用于模拟上述场景

create table tableName(sentenceData  text);

INSERT INTO tableName(sentenceData) VALUES('This is a long sentence');

INSERT INTO tableName(sentenceData) VALUES('I am currently doing grammar, hitting this monster book btw!');

INSERT INTO tableName(sentenceData) VALUES('Just tonnes of grammar, problem is I bought it in TAIWAN, and so there aint any englihs, just chinese and japanese');

select regexp_split_to_array(sentenceData,E'\\s+')   from tableName;

select unnest(regexp_split_to_array(sentenceData,E'\\s+')) from tableName;

Answer 1

签出pg_trgm：“pg_trgm模块提供函数和运算符，用于根据trigram匹配确定文本的相似性，以及支持快速搜索类似字符串的索引运算符类。”

来自PostgreSQL中的文本的n-gram

1 个答案: