PostgreSQL正则表达式 - 将列拆分为数组

时间:2016-02-25 17:38:16

标签: regex postgresql

我有桌上乐曲:

author                |  music
----------------------+-------
Kevin Clein           |   a
Gucio G. Gustawo      |   b
R. R. Andrzej         |   c
John McKnight Burman  |   d

如何拆分包含两个不同符号(空格和点)的列以及如何正确拆分名称和surmane以获得如下结果:

author                |  name   | surname
----------------------+---------+----------------
Kevin Clein           |   Kevin | Clein           
Gucio G. Gustawo      | Gucio G.| Gustawo
R. R. Andrzej         |   R. R. | Andrzej
John McKnight Burman  |   John  | McKnight Burman

到目前为止,我尝试过类似的东西:

WITH ad AS(
SELECT author,
  s[1] AS name,
  s[2] AS surname
  FROM (SELECT music.*,
  regexp_split_to_array(music.author,E'\\s[.]') AS s
       FROM music)t
)SELECT * FROM ad;

1 个答案:

答案 0 :(得分:1)

我已经为您创建了可能的解决方案。请注意,它可能无法解决所有问题,您需要创建一个额外的表来解决规则问题。通过规则我的意思是我在评论中所说的话:

  

何时确定名称和姓氏。

因此,为了解决您的问题,我必须创建另一个表来处理应该被视为符号的姓氏。

测试用例场景:

create table surname (
  id SERIAL NOT NULL primary key,
  sample varchar(100)
);

--Test case inserts
insert into surname (sample) values ('McKnight'), ('McGregory'), ('Willian'), ('Knight');

create table music (
  id SERIAL NOT NULL primary key,
  author varchar(100)
);

insert into music (author) values
('Kevin Clein'),
('Gucio G. Gustawo'),
('R. R. Andrzej'),
('John McKnight Burman'),
('John Willian Smith'),
('John Williame Smith');

我建议的解决方案:

select author,
       trim(replace(author, surname, '')) as name,
       surname
  from (
    select author,
          case when position(s.sample in m.author)>0 
          then (regexp_split_to_array( m.author, '\s(?='||s.sample||')' ))[2]::text
          else trim(substring( author from '\s\w+$'  ))
           end as surname
      from music m left join surname s 
        on m.author like '%'||s.sample||'%'
     where case when position(s.sample in m.author)>0 
          then (regexp_split_to_array( m.author, '\s(?='||s.sample||')' ))[2]::text
          else trim(substring( author from '\s\w+$'  )) end is not null
       ) as x

输出将是:

   AUTHOR              NAME             SURNAME
------------------------------------------------------------   
Kevin Clein            Kevin            Clein
Gucio G. Gustawo       Gucio G.         Gustawo
R. R. Andrzej          R. R.            Andrzej
John McKnight Burman   John             McKnight Burman
John Willian Smith     John             Willian Smith
John Williame Smith    John Williame    Smith

在此处查看:the two argument form

在表格姓氏中,您将插入所有应被视为姓氏的名称。

您可能希望子查询执行case表达式的查询,因此您只需在where子句上使用字段而不是hole case语句。