Question

我有一个需要查询的redshift数据库，并且需要将相似的字符串分组在一起。我正在使用regexp_replace（）来执行此操作，但无法弄清楚如何将中间具有整数的字符串分组。例如：

数据集：

string
'aaa/123/bbb'
'aaa/456/bbb'
'ccc/123/ddd'

我需要对此分组，以便我们获得

string     count(*)
aaa/id/bbb 2
ccc/id/ddd 1

所以我尝试使用

regexp_replace(endpoint, '/[0-9]+$/', '/id/')

但是它不起作用，我假设是因为没有通配符之类的东西？但是我无法解决该问题。

预先感谢

Answer 1

我知道您也想在末尾替换数字。这接近您想要的：

select regexp_replace(endpoint, '/[0-9]+(/|$)', '/id/')
from (select 'aaa/123/bbb' as endpoint union all
      select 'aaa/123' as endpoint 
      ) x

但是在第二种情况下，它在末尾返回一个斜杠。

如果您没有其他以数字开头的中间值，那么您可以这样做：

select regexp_replace(endpoint, '/[0-9]+', '/id')
from (select 'aaa/123/bbb' as endpoint union all
      select 'aaa/123' as endpoint 
      ) x

否则，两次调用regexp_replace()可以解决问题：

select regexp_replace(regexp_replace(endpoint, '/[0-9]+/', '/id/'), '/[0-9]$', '/id')
from (select 'aaa/123/bbb' as endpoint union all
      select 'aaa/123' as endpoint 
      ) x;

使用postgres regexp_replace（）替换字符串中的整数

1 个答案: