拆分单词短语并给出Postgres中的所有子组

时间:2017-11-01 17:08:24

标签: sql postgresql select

我有一个表项目名称,如下所示:

Microsoft Word
Adobe Premiere
Paint
Mozila Firefox
Adobe Photoshop CS7
Windows Movie Maker

我想选择数据(表格产品,列名称)变成这样:

Microsoft
Word
Microsoft Word
Adobe
PremiereF
Adobe Premier
Paint
Mozila firefox
Adobe 
Photoshop
CS7
Adobe Photoshop
Photoshop CS7
Windows
Movie
Maker

我正在使用Postgres ....是否可以这样做?

2 个答案:

答案 0 :(得分:1)

db<>fiddle

我不清楚您的预期结果是什么。

对于Adobe Photoshop CS7,您的结果是:

Adobe 
Photoshop
CS7
Adobe Photoshop
Photoshop CS7

原始字符串Adobe Photoshop CS7呢?对于解决方案,我希望您希望所有子短语的顺序正确。因此,解决方案应包括Adobe Photoshop CS7结果。您的其他结果(包括原始字符串)表明了这一点。


(1)第一步:从头开始获取所有子短语:

String: A B C D E

A
A B
A B C
A B C D
A B C D E

查询

WITH single_words AS (
    SELECT *, row_number() OVER (PARTITION BY id) AS nth_word FROM (         -- B
        SELECT id, regexp_split_to_table(phrase, '\s') as word FROM phrases  -- A
    )s
)
SELECT 
    array_agg(word) OVER (PARTITION BY id ORDER BY nth_word) as phrase_part  -- C
FROM single_words;

A:WITH查询使查询简化为只编写一次子查询(在(2)中使用)。 regexp_split_to_table函数在空白处分割字符串,并将每个单词放在一行中。

B:窗口函数row_number在单词上添加一个计数器,以指示原始字符串(https://www.postgresql.org/docs/current/static/tutorial-window.html)中的原点位置。

C:窗口函数array_agg() OVER (... ORDER BY nth_word)将单词聚合到一个列表中。 ORDER BY用于获取由原始单词位置指示的升序单词列表(如果没有ORDER BYarray_agg会添加该短语的所有单词,从而为所有{{1 }}行)


(2)第二步:从所有起点获取所有子短语:

word

查询

String: A B C D E

A
B
C
D
E
A B
B C
C D
D E
A B C
B C D
C D E
A B C D
B C D E
A B C D E

A:与(1)中相同

B:将短语与自己交叉连接;更好的说:将同一个词的每个后续词连在一起

C:此窗口函数将短语词聚合到给定的结果中。


如果您不喜欢该数组,则可以使用函数WITH single_words AS ( -- A SELECT *, row_number() OVER (PARTITION BY id) AS nth_word FROM ( SELECT id, regexp_split_to_table(phrase, '\s') as word FROM phrases )s ) SELECT *, array_agg(b.word) OVER (PARTITION BY a.id, a.nth_word ORDER BY a.id, a.nth_word, b.nth_word) as phrase_part -- C FROM single_words a -- B JOIN single_words b ON (a.id = b.id AND a.nth_word <= b.nth_word)

将结果转换为字符串

答案 1 :(得分:0)

您可以使用regexp_split_to_array

CREATE TABLE s(c TEXT);
INSERT INTO s(c) VALUES('Microsoft Word'), ('Adobe Premiere');

SELECT unnest(regexp_split_to_array(s.c, '\s+'))
FROM s
UNION ALL
SELECT c
FROM s;

<强> Rextester Demo

修改

获取您可以使用的每种组合:

WITH src AS (
    SELECT id,name, rn::int, (MAX(rn) OVER(PARTITION BY id))::int AS m_rn
    FROM s, 
     unnest(regexp_split_to_array(s.c, '\s+')) WITH ORDINALITY AS sub(name,rn)
)
SELECT id, string_agg(b.Name ,' ' ORDER BY rn) AS combination
FROM (SELECT p.id, p.Name, p.rn, RIGHT(o.n::bit(16)::text, m_rn) AS bitmap
      FROM src AS p
      CROSS JOIN generate_series(1, 100000) AS o(n)     
      WHERE o.n < 2 ^ m_rn) b
WHERE SUBSTRING(b.bitmap, b.rn, 1) = '1'
GROUP BY b.id, b.bitmap
ORDER BY id, b.bitmap;

<强> Rextester Demo 2