正则表达式:检测字母表中的任何子字符串?

时间:2014-09-23 15:02:55

标签: regex oracle

第一步

  • 将“字母”定义为ABCDEFGHIJKLMNOPQRSTUVWXYZ,我想找到字母表中的任何子字符串。我需要从这里建立更多,但这是我的第一个挑战。

最终目标

  • 给定一个字符模式(A-Z),没有重复,没有空格,只有递增字符(ABDE,从不ABED),用Oracle语句中的单个空格替换字母表中所有缺少的字符。因此,一行中的一行可能会显示ABCDEGHIJKLMNOPQTUVWXYZ(缺少F和RS),需要阅读ABCDE GHIJKLMNOPQ TUVWXYZ

这甚至可能吗?

大卫

5 个答案:

答案 0 :(得分:3)

出于好奇,尝试了这个......

WITH FULL_ALPHABETS AS
  (
    SELECT CHR(64+level) alpha,rownum AS id
      FROM DUAL
    CONNECT BY LEVEL<=26
  ),
INPUT_ALPHABETS AS
  (
    SELECT SUBSTR(UPPER('ABCDEFYZ'),level,1) alpha, rownum AS id
      FROM dual
    CONNECT BY level <= LENGTH(UPPER('ABCDEFYZ'))
  )
SELECT LISTAGG(NVL(I.ALPHA,' ')) WITHIN GROUP (ORDER BY F.ALPHA)
  FROM FULL_ALPHABETS F
    LEFT OUTER JOIN INPUT_ALPHABETS I
     ON (F.ALPHA = I.ALPHA)
ORDER BY F.ALPHA;

答案 1 :(得分:3)

对于单个值,您可以使用两个connect-by子句;一个用于生成26个值,另一个用于将原始字符串拆分为单个字符。由于ASCII码是连续的,因此ascii()功能可用于为每个存在的字符生成1-26的数字。然后左键加入两个列表:

var str varchar2(26);
exec :str := 'ABCDFGZ';

with alphabet as (
  select level as pos
  from dual connect by level <= 26
),
chars as (
  select substr(:str, level, 1) character,
    ascii(substr(:str, level, 1)) - 64 as pos
  from dual connect by level <= length(:str)
)
select listagg(nvl(chars.character, ' '))
  within group (order by alphabet.pos) as result
from alphabet
left outer join chars on chars.pos = alphabet.pos;

RESULT                   
--------------------------
ABCD FG                  Z 

这是一个SQL * Plus绑定变量,以避免重复该字符串,但它可以从其他地方插入。

创建视图有点复杂,因为表中的多行可能会导致连接问题。可能值列表必须包括表中的主键(或至少唯一键),以及原始字符串(如果要包含该键(以及表中所需的任何其他列))。拆分列表还需要包含主键,并且需要包含在外连接中。

create view v42 as
with possible as (
  select id, str, level as pos
  from t42
  connect by level <= 26
  and prior id = id
  and prior sys_guid() is not null
),
actual as (
  select id, substr(str, level, 1) character,
    ascii(substr(str, level, 1)) - 64 as pos
  from t42
  connect by level <= length(str)
  and prior id = id
  and prior sys_guid() is not null
)
select possible.id, possible.str, listagg(nvl(actual.character, ' '))
  within group (order by possible.pos) as result
from possible
left outer join actual on actual.id = possible.id and actual.pos = possible.pos
group by possible.id, possible.str;

然后使用一些示例数据,select * from v42给出:

        ID STR                        RESULT                   
---------- -------------------------- --------------------------
         1 A                          A                          
         2 Z                                                   Z 
         3 AZ                         A                        Z 
         4 ABCDFGZ                    ABCD FG                  Z 
         5 ABCDEGHIJKLMNOPQTUVWXYZ    ABCDE GHIJKLMNOPQ  TUVWXYZ 

SQL Fiddle demo

使用递归CTE可能会更清洁一点。或者使用一次处理一个值的函数。或者使用正则表达式,当然......

Here's a recursive CTE version,为了好玩:

create view v42 as
with possible(id, str, pos, character) as (
  select id, str, 1, 'A'
  from t42
  union all
  select id, str, pos + 1, chr(64 + pos + 1)
  from possible
  where pos < 26
),
actual (id, str, pos, character) as (
  select id, str, 1, substr(str, 1, 1)
  from t42
  union all
  select id, str, pos + 1, substr(str, pos + 1, 1)
  from actual
  where pos < length(str)
)
select possible.id, possible.str, listagg(nvl(actual.character, ' '))
  within group (order by possible.pos) as result
from possible
left outer join actual
on actual.id = possible.id
and actual.character = possible.character
group by possible.id, possible.str;

(SQL Fiddle用间距做奇怪的事情,所以从'Run SQL'下拉列表中查看纯文本输出。)

答案 2 :(得分:2)

重要说明:这种方式不适用于Oracle,因为不支持环视功能。

你可以用一点点技巧来做到这一点:

  1. 首先,将ABCDEFGHIJKLMNOPQRSTUVWXYZ与换行符和字符串连接起来。

  2. 您使用此模式使用空格(?!([A-Z]+)(?=.*\1))[A-Z] 执行替换(使用dotall修饰符以允许点匹配换行符)

  3. 在换行符上拆分字符串并保留第一部分

  4. regex demo

答案 3 :(得分:1)

我目前正在使用10g,所以没有LISTAGG。到目前为止,我的方法与其他方法类似,但我想出了这个。我应该提到Oracle不支持WM_CONCAT,如果这会让你感到麻烦。

select replace(wm_concat(OUTPUT_CHAR),',') OUTPUT_STRING 
from
  (select nvl(INPUT_STRING.INPUT_CHAR,' ') OUTPUT_CHAR
   from 
    (select chr(64 + level) LETTER 
     from dual connect by level <= 26) ALPHABET
   left join 
    (select substr(:input_string, level, 1) INPUT_CHAR 
     from dual connect by level <= length(:input_string)) INPUT_STRING
   on ALPHABET.LETTER = INPUT_STRING.INPUT_CHAR
   order by ALPHABET.LETTER);

答案 4 :(得分:0)

无需拆分和聚合字符串。只需一个简单的regex_replace即可。

给定字符串被[^]包围,因此不会替换该列表中不包含的任何字符。

SQL> var str varchar2(26);
SQL> exec :str := 'AQ';

SQL> select regexp_replace(
  'ABCDEFGHIJKLMNOPQRSTUVWXYZ',
  '[^'||:str||']'
  ,' '
)                               resulting_str
from dual;

RESULTING_STR
------------------------------
A               Q


SQL> exec :str := 'A';

select regexp_replace(
  'ABCDEFGHIJKLMNOPQRSTUVWXYZ',
  '[^'||:str||']'
  ,' '
)                               resulting_str
from dual;

RESULTING_STR
------------------------------
A


SQL> exec :str := 'Z';

SQL> select regexp_replace(
  'ABCDEFGHIJKLMNOPQRSTUVWXYZ',
  '[^'||:str||']'
  ,' '
)                               resulting_str
from dual;

RESULTING_STR
------------------------------
                         Z