如何为每一行运行ts_stat

时间:2017-09-16 09:46:28

标签: postgresql

我有一张桌子'帖子'如下, id | title | description ---+----------+--------------------------------------------- 1 | title1 | A short description about title1 2 | title2 | this is about title2 3 | title2 | this is some other description 4 | title1 | this is different from previous description 有两个问题,

  1. 我需要获取给定标题的所有描述的每个单词的计数。我尝试使用ts_stat()但是它给了我完整列的单词频率(无论它属于哪个标题)。 select ts_stat($$select to_tsvector('simple', posts.description) from posts$$); 寻找有关创建新表的帮助,每行包含标题,单词,计数。

  2. 最初我考虑将行创建为(标题,逗号分隔的单词及其计数)作为列,但获取给定标题的单词计数可能需要一些额外的工作,所以想到添加一个新行每个标题的每个单词。

  3. 如果有更好的方法可以让我知道。

    version: PostgreSQL 9.5.8

1 个答案:

答案 0 :(得分:0)

我想不出任何不那么怪物了

t=# with c as (
  select to_tsvector('simple',unnest(string_to_array(description,' '))),title
  from posts
)
, d as (
  select translate(split_part(to_tsvector::text,':',1),$$'$$,'') ts,title
  from c
  where octet_length(to_tsvector::text) > 0
)
select ts,title,count(1)
from d
group by title,ts
order by 1;
     ts      |   title    | count
-------------+------------+-------
 a           |  title1    |     1
 about       |  title1    |     1
 about       |  title2    |     1
 description |  title1    |     2
 description |  title2    |     1
 different   |  title1    |     1
 from        |  title1    |     1
 is          |  title2    |     2
 is          |  title1    |     1
 other       |  title2    |     1
 previous    |  title1    |     1
 short       |  title1    |     1
 some        |  title2    |     1
 this        |  title1    |     1
 this        |  title2    |     2
 title1      |  title1    |     1
 title2      |  title2    |     1
(17 rows)

与...协调:

t=# select ts_stat('select to_tsvector($$simple$$,description) from posts') order by 1 ;
      ts_stat
-------------------
 (a,1,1)
 (about,2,2)
 (description,3,3)
 (different,1,1)
 (from,1,1)
 (is,3,3)
 (other,1,1)
 (previous,1,1)
 (short,1,1)
 (some,1,1)
 (this,3,3)
 (title1,1,1)
 (title2,1,1)
(13 rows)

但是再一次 - 我对FTS的体验非常有限 - 可能你可以用ts_functions做得更好