我希望从PostgreSQL中的text列创建n-gram。我目前在文本列中将数据(句子)上的数据(句子)拆分为数组。
enter code here
从tableName中选择regexp_split_to_array(sentenceData,E'\ s +')
一旦我有了这个数组,我该如何解决:
使用unexst我可以在不同的行上获取所有数组的所有元素,也许我可以想到一种从单个列中获取n-gram的方法,但是我会松开句子边界,这是我明智的保留
PostgreSQL的示例SQL代码,用于模拟上述场景
create table tableName(sentenceData text);
INSERT INTO tableName(sentenceData) VALUES('This is a long sentence');
INSERT INTO tableName(sentenceData) VALUES('I am currently doing grammar, hitting this monster book btw!');
INSERT INTO tableName(sentenceData) VALUES('Just tonnes of grammar, problem is I bought it in TAIWAN, and so there aint any englihs, just chinese and japanese');
select regexp_split_to_array(sentenceData,E'\\s+') from tableName;
select unnest(regexp_split_to_array(sentenceData,E'\\s+')) from tableName;