Question

我正在尝试检查从列中排除字符串模式的最佳和最佳方法，而不影响实际数据。

在Redshift DW中，我有一个表列company，它的某些记录以INC结尾，因此希望排除INC的字符串模式并仅捕获公司名称。请参见下面的示例数据和预期输出。

WITH T AS (
    select 'Cincin,Inc' id
    union all
    select 'Tinc, INc.' id 
    union all
    select 'Cloud' id 
    union all
    select 'Dinct Inc.' id 
)

select id , regexp_replace(id,{exp}) from T


/**OutPut***/
Cincin
Tinc
Cloud
Dinct

Answer 1

Redshift不支持使用正则表达式区分大小写，但是鉴于目标字符串很小，您可以使用[Ii][Nn][Cc]来解决该问题，而不会造成太多麻烦：

regexp_replace(id, ',? *[Ii][Nn][Cc]\.?$', '')

请参见live demo。

测试：

WITH T AS (
    select 'Cincin,Inc' id
    union all
    select 'Tinc, INc.' id 
    union all
    select 'Cloud' id 
    union all
    select 'Dinct Inc.' id 
)    
select id , regexp_replace(id, ',? *[Ii][Nn][Cc]\.?$', '') from T

输出：

Cincin
Tinc
Cloud
Dinct

Answer 2

尝试替换模式,?\s*Inc\.?$：

select id, regexp_replace(id, ',?\\s*[Ii][Nn][Cc]\\.?$', '') from T

Answer 3

如果您对此案不感兴趣，可以使用此

WITH T AS (
select 'Cincin,Inc' id
union all
select 'Tinc, INc.' id 
union all
select 'Cloud' id 
union all
select 'Dinct Inc.' id

）

select id , regexp_replace(lower(iD),'[^a-z]+(inc)([^a-z])*','') 
from T

输出：

  id        regexp_replace
Cincin,Inc  cincin
Tinc, INc.  tinc
Cloud       cloud
Dinct Inc.  dinct

正则表达式替换字符串模式

3 个答案: