我有一张桌子'帖子'如下,
id | title | description
---+----------+---------------------------------------------
1 | title1 | A short description about title1
2 | title2 | this is about title2
3 | title2 | this is some other description
4 | title1 | this is different from previous description
有两个问题,
我需要获取给定标题的所有描述的每个单词的计数。我尝试使用ts_stat()
但是它给了我完整列的单词频率(无论它属于哪个标题)。
select ts_stat($$select to_tsvector('simple', posts.description) from posts$$);
寻找有关创建新表的帮助,每行包含标题,单词,计数。
最初我考虑将行创建为(标题,逗号分隔的单词及其计数)作为列,但获取给定标题的单词计数可能需要一些额外的工作,所以想到添加一个新行每个标题的每个单词。
如果有更好的方法可以让我知道。
version: PostgreSQL 9.5.8
答案 0 :(得分:0)
我想不出任何不那么怪物了
t=# with c as (
select to_tsvector('simple',unnest(string_to_array(description,' '))),title
from posts
)
, d as (
select translate(split_part(to_tsvector::text,':',1),$$'$$,'') ts,title
from c
where octet_length(to_tsvector::text) > 0
)
select ts,title,count(1)
from d
group by title,ts
order by 1;
ts | title | count
-------------+------------+-------
a | title1 | 1
about | title1 | 1
about | title2 | 1
description | title1 | 2
description | title2 | 1
different | title1 | 1
from | title1 | 1
is | title2 | 2
is | title1 | 1
other | title2 | 1
previous | title1 | 1
short | title1 | 1
some | title2 | 1
this | title1 | 1
this | title2 | 2
title1 | title1 | 1
title2 | title2 | 1
(17 rows)
与...协调:
t=# select ts_stat('select to_tsvector($$simple$$,description) from posts') order by 1 ;
ts_stat
-------------------
(a,1,1)
(about,2,2)
(description,3,3)
(different,1,1)
(from,1,1)
(is,3,3)
(other,1,1)
(previous,1,1)
(short,1,1)
(some,1,1)
(this,3,3)
(title1,1,1)
(title2,1,1)
(13 rows)
但是再一次 - 我对FTS的体验非常有限 - 可能你可以用ts_functions做得更好