从字符串中拆分特定的数字链

时间:2014-09-17 14:20:44

标签: sql regex postgresql regex-greedy

下面有这个表(称为数据):

row    comments
  1    Fortune favors https://something.aaa.org/show_screen.cgi?id=548545 the 23 bold
  2    No man 87485 is id# 548522 an island 65654.       
  3    125 Better id NEWLINE #546654 late than 5875565 never.
  4    555 Better id546654 late than 565 never

我使用了以下查询:

select row, substring(substring(comments::text, '((id|ID) [0-9]+)'), '[0-9]+') as id 
from data 
where comments::text ~* 'id [0-9]+';

此查询输出忽略了第1行到第3行。它只处理了第4行:

row   id
 4    546654

你们当中有些人知道如何正确分割身份证号码吗?请注意,ID最多包含9位数字。

1 个答案:

答案 0 :(得分:0)

使用regexp_replace():

SELECT c.rownr
        , regexp_replace (c.comments, e'.*[Ii][Dd][^0-9]*([0-9]+).*', '\1' ) AS the_id
        , c.comments AS comments
FROM comments c
        ;
  • .*匹配初始垃圾
  • `[Ii] [Dd]匹配Id字符串,案例无关紧要
  • [^0-9]*使用非数字字符
  • ([0-9]+)匹配您想要的数字字符串
  • .*匹配任何尾随字符
  • '\1'(在第3个参数中)告诉您希望匹配在第一个()
  • 内的东西

结果:

 rownr | the_id |                         comments                                    
-------+--------+--------------------------------------------------------------------------------
     1 | 548545 | Fortune favors https://something.aaa.org/show_screen.cgi?id=548545 the 23 bold
     2 | 548522 | No man 87485 is id# 548522 an island 65654.       
     3 | 546654 | 125 Better id NEWLINE #546654 late than 5875565 never.
     4 | 546654 | 555 Better id546654 late than 565 never
(4 rows)