选择第一个,最后一个或两个字符都为特殊字符或标点符号的行,除非它们的末尾只有句点

时间:2019-04-04 10:47:37

标签: sql oracle regexp-like

我需要从表中检索名称以[:space:]或其他特殊字符[:punct:]开头或结尾的行,但在行末不包括单个点(.)。名字。这个想法是拉出可能不一致的名称。

必须出现的示例:

  1. 'GEORGE & SON '-末尾有多余的空格。
  2. '-GEORGE & SON'-开头有一个额外的-
  3. '&GEORGE & SON'-开头有一个额外的&
  4. '-GEORGE & SON S.A.'-开头有一个额外的-。末尾的点.没问题。
  5. 'GEORGE & SON..'-末尾没有一个,但有两个点。以多个.结尾的字符串是一个例外。他们也是坏名字。

不能出现的示例:

  1. 'GEORGE & SON.'-最后只有一个额外的“。”。

我正在使用表达式:

REGEXP_LIKE(col, '(^[[:punct:]]|[[:punct:]]$)|(^[[:space:]]|[[:space:]]$)')

但是,尽管检索以空格或特殊字符开头或结尾的名称,但也会拉出带有点“。”的名称。作为最后一个字符。

如何更改此设置以获得所需的结果?

2 个答案:

答案 0 :(得分:0)

只需在第二个{2}之后添加[[:punct:]]即可。这意味着该点至少应存在2次

with tab as(
  select 'GEORGE & SON ' as s from dual union all
  select '-GEORGE & SON'  as s from dual union all
  select '&GEORGE & SON'  as s from dual union all
  select 'GEORGE & SON..'  as s from dual union all
  select 'GEORGE & SON.'  as s from dual union all
  select '-GEORGE & SON S.A.' as s from dual  
)
select * from  tab 
where REGEXP_LIKE(s, '(^[[:punct:]]|[[:punct:]]{2}$)|(^[[:space:]]|[[:space:]]$)') 

答案 1 :(得分:0)

由于预定义的标点符号类不适用于字符串的结尾,因此将使用自定义字符类。故意遗漏点。单独添加单引号(因为转义不起作用,并且在这种情况下可能很难为 q 运算符找到正确的字符)。由于Oracle本身加上了右方括号,因此转义时似乎无法正确处理。最后明确添加尾随的连续点:

WITH T (id, col) AS (
  SELECT 1, 'GEORGE & SON ' FROM DUAL UNION ALL
  SELECT 2, '-GEORGE & SON'  FROM DUAL UNION ALL
  SELECT 3, '&GEORGE & SON'  FROM DUAL UNION ALL
  SELECT 4, 'GEORGE & SON..'  FROM DUAL UNION ALL
  SELECT 5, 'GEORGE & SON.'  FROM DUAL UNION ALL
  SELECT 6, '-GEORGE & SON S.A.' FROM DUAL UNION ALL
  SELECT 7, 'GEORGE & SON!' FROM DUAL UNION ALL
  SELECT 8, 'GEORGE & SON"' FROM DUAL UNION ALL
  SELECT 9, 'GEORGE & SON#' FROM DUAL UNION ALL
  SELECT 10, 'GEORGE & SON$' FROM DUAL UNION ALL
  SELECT 11, 'GEORGE & SON%' FROM DUAL UNION ALL
  SELECT 12, 'GEORGE & SON&' FROM DUAL UNION ALL
  SELECT 13, 'GEORGE & SON(' FROM DUAL UNION ALL
  SELECT 14, 'GEORGE & SON)' FROM DUAL UNION ALL
  SELECT 15, 'GEORGE & SON*' FROM DUAL UNION ALL
  SELECT 16, 'GEORGE & SON+' FROM DUAL UNION ALL
  SELECT 17, 'GEORGE & SON,' FROM DUAL UNION ALL
  SELECT 18, 'GEORGE & SON\' FROM DUAL UNION ALL
  SELECT 19, 'GEORGE & SON-' FROM DUAL UNION ALL
  SELECT 20, 'GEORGE & SON\' FROM DUAL UNION ALL
  SELECT 21, 'GEORGE & SON/' FROM DUAL UNION ALL
  SELECT 22, 'GEORGE & SON:' FROM DUAL UNION ALL
  SELECT 23, 'GEORGE & SON;' FROM DUAL UNION ALL
  SELECT 24, 'GEORGE & SON<' FROM DUAL UNION ALL
  SELECT 25, 'GEORGE & SON=' FROM DUAL UNION ALL
  SELECT 26, 'GEORGE & SON>' FROM DUAL UNION ALL
  SELECT 27, 'GEORGE & SON?' FROM DUAL UNION ALL
  SELECT 28, 'GEORGE & SON@' FROM DUAL UNION ALL
  SELECT 29, 'GEORGE & SON[' FROM DUAL UNION ALL
  SELECT 30, 'GEORGE & SON^' FROM DUAL UNION ALL
  SELECT 31, 'GEORGE & SON_' FROM DUAL UNION ALL
  SELECT 32, 'GEORGE & SON`' FROM DUAL UNION ALL
  SELECT 33, 'GEORGE & SON{' FROM DUAL UNION ALL
  SELECT 34, 'GEORGE & SON|' FROM DUAL UNION ALL
  SELECT 35, 'GEORGE & SON}' FROM DUAL UNION ALL
  SELECT 36, 'GEORGE & SON~' FROM DUAL UNION ALL
  SELECT 37, 'GEORGE & SON''' FROM DUAL UNION ALL
  SELECT 38, 'GEORGE & SON]' FROM DUAL)
SELECT
  * FROM T
 WHERE REGEXP_LIKE(col, '(^[[:punct:]]|[-!"#$%&()*+,\/:;<=>?@[^_`{|}~' || '''' || ']$)|]$|\.\.$|(^[[:space:]]|[[:space:]]$)')
 ORDER BY id
;

更新的要求

标点符号后跟一个点

在特殊字符集中添加一个可选点;来自

'[-!"#$%&()*+,\/:;<=>?@[^_`{|}~' || '''' || ']$'

'[-!"#$%&()*+,\/:;<=>?@[^_`{|}~' || '''' || ']\.?$'

相同
WITH T (id, col) AS (
  SELECT 40, 'GEORGE & SON^.'FROM DUAL UNION ALL
  SELECT 41, 'GEORGE & SON_.'FROM DUAL UNION ALL
  SELECT 42, 'GEORGE & SON`.'FROM DUAL UNION ALL
  SELECT 43, 'GEORGE & SON{.'FROM DUAL UNION ALL
  SELECT 44, 'GEORGE & SON|.'FROM DUAL UNION ALL
  SELECT 45, 'GEORGE & SON}.'FROM DUAL UNION ALL
  SELECT 46, 'GEORGE & SON~.'FROM DUAL UNION ALL
  SELECT 47, 'GEORGE & SON''.'FROM DUAL UNION ALL
  SELECT 48, 'GEORGE & SON].'FROM DUAL)
SELECT
  * FROM T
 WHERE REGEXP_LIKE(col, '([-!"#$%&()*+,\/:;<=>?@[^_`{|}~' || '''' || ']\.?$)|]\.?$')
 ORDER BY id
;

字符串中空格和特殊字符的重复(组合)

最初,只要求发生前导事件和尾随事件...;-)

两个或更多空格/标点符号的序列被

捕获
[[:space:][:punct:]]{2,}

如果您想在字符串中明确使用此字符,请-用单词字符将其包围:

\w[[:space:][:punct:]]{2,}\w

当找到单个空格时,前导/后继连续空格已经匹配-无需显式担心它们。
  给出:

WITH T (id, col) AS (
  SELECT 50, 'GEORGE & SON  ' FROM DUAL UNION ALL
  SELECT 51, 'GEORGE & SON   '  FROM DUAL UNION ALL
  SELECT 52, '  GEORGE & SON'  FROM DUAL UNION ALL
  SELECT 53, '    GEORGE & SON'  FROM DUAL UNION ALL
  SELECT 54, 'GEORGE &  SON'  FROM DUAL UNION ALL
  SELECT 55, 'GEORGE  & SON S.A.' FROM DUAL UNION ALL
  SELECT 56, 'GEORGE & SON    S.A.' FROM DUAL UNION ALL
  SELECT 60, '  GEORGE and SON'  FROM DUAL UNION ALL
  SELECT 61, ' ,GEORGE and SON' FROM DUAL UNION ALL
  SELECT 62, ', GEORGE and SON'  FROM DUAL UNION ALL
  SELECT 63, 'GEORGE -- SON' FROM DUAL UNION ALL
  SELECT 64, 'GEORGE --SON' FROM DUAL UNION ALL
  SELECT 65, 'GEORGE & SON' FROM DUAL UNION ALL
  SELECT 66, 'GEORGE + SON' FROM DUAL UNION ALL
  SELECT 67, 'GEORGE and  , SON' FROM DUAL UNION ALL
  SELECT 68, 'GEORGE and , SON' FROM DUAL UNION ALL
  SELECT 69, 'GEORGE and SON ,'  FROM DUAL UNION ALL
  SELECT 70, 'GEORGE and SON. '  FROM DUAL UNION ALL
  SELECT 71, 'GEORGE and+-SON'  FROM DUAL)
SELECT
  * FROM T
--  WHERE REGEXP_LIKE(col, '(^[[:punct:]]|[-!"#$%&()*+,\/:;<=>?@[^_`{|}~' || '''' || ']\.?$)|]$|\.\.$|(^[[:space:]]|[[:space:]]$)|[[:space:][:punct:]]{2,}')
  WHERE REGEXP_LIKE(col, '(^[[:punct:]]|[-!"#$%&()*+,\/:;<=>?@[^_`{|}~' || '''' || ']\.?$)|]$|\.\.$|(^[[:space:]]|[[:space:]]$)|\w[[:space:][:punct:]]{2,}\w')
  ORDER BY id
;

但是会产生误报,最明显的是 GEORGE&SON 。在某种程度上,可以通过用较少包含的集合替换 [:punct:] 来避免这种情况。 (最终)选择将取决于是否更关注假阴性或假阳性。

查看实际情况:

捕获标点符号和空格字符的任意序列-但允许单个字母后跟单个点和单个空格

如前所述,误报需要与误报相平衡。一种或另一种方式。   但是,这可能是考虑将整体问题分解为较小的问题并单独处理的好时机。即使 GEORGE和P. SON 是完全可以接受的,您也可能想要查看例如 -GEORGE和P. SON 。因此,让我们集中讨论字符串中间的流浪字符序列-甚至还记得以前的**和**,并允许枚举(并因此使用逗号):

WHERE
  REGEXP_LIKE(col, '\w[[:space:][:punct:]]{2,}\w')
  AND
  NOT REGEXP_LIKE(col, ' [[:upper:]]\. \w')
  AND
  NOT INSTR(col, ', ') > 0
  AND
  NOT INSTR(col, ' & ') > 0

可能后跟

  WHERE
  REGEXP_LIKE(col, '\w[[:space:][:punct:]]{2,}\w')
  AND
  (REGEXP_LIKE(col, ' [[:upper:]]\. \w')
   OR
   INSTR(col, ', ') > 0
   OR
   INSTR(col, ' & ') > 0
  )

,以便在许多有效的字符之间找到例如 GEORGE和SON INSTR可能比REGEX快-取决于整体情况……

关于力学的一些话

(i) [[:punct:] [:space:]] 本质上是将 [[:punct:]] [[:space: ]] 。就从该类中选择而言,顺序无关紧要。

(ii)

[-!"#$%&()*+,\/:;<=>?@[^_`{|}~' || '''' || ']

[-!"#$%&()*+,\/:;<=>?@[^_`{|}~]

添加单引号。如果直接尝试这样做,Oracle将考虑使用单引号将参数值结尾。并且用反斜杠转义单引号是行不通的...因此,基本上,这就是“单独添加单引号”上方的内容。

如果且因为这需要调整/进一步的细节,请发表评论。