我需要从表中检索名称以[:space:]
或其他特殊字符[:punct:]
开头或结尾的行,但在行末不包括单个点(.
)。名字。这个想法是拉出可能不一致的名称。
必须出现的示例:
'GEORGE & SON '
-末尾有多余的空格。'-GEORGE & SON'
-开头有一个额外的-
。'&GEORGE & SON'
-开头有一个额外的&
。'-GEORGE & SON S.A.'
-开头有一个额外的-
。末尾的点.
没问题。'GEORGE & SON..'
-末尾没有一个,但有两个点。以多个.
结尾的字符串是一个例外。他们也是坏名字。不能出现的示例:
'GEORGE & SON.'
-最后只有一个额外的“。”。我正在使用表达式:
REGEXP_LIKE(col, '(^[[:punct:]]|[[:punct:]]$)|(^[[:space:]]|[[:space:]]$)')
但是,尽管检索以空格或特殊字符开头或结尾的名称,但也会拉出带有点“。”的名称。作为最后一个字符。
如何更改此设置以获得所需的结果?
答案 0 :(得分:0)
只需在第二个{2}
之后添加[[:punct:]]
即可。这意味着该点至少应存在2次
with tab as(
select 'GEORGE & SON ' as s from dual union all
select '-GEORGE & SON' as s from dual union all
select '&GEORGE & SON' as s from dual union all
select 'GEORGE & SON..' as s from dual union all
select 'GEORGE & SON.' as s from dual union all
select '-GEORGE & SON S.A.' as s from dual
)
select * from tab
where REGEXP_LIKE(s, '(^[[:punct:]]|[[:punct:]]{2}$)|(^[[:space:]]|[[:space:]]$)')
答案 1 :(得分:0)
由于预定义的标点符号类不适用于字符串的结尾,因此将使用自定义字符类。故意遗漏点。单独添加单引号(因为转义不起作用,并且在这种情况下可能很难为 q 运算符找到正确的字符)。由于Oracle本身加上了右方括号,因此转义时似乎无法正确处理。最后明确添加尾随的连续点:
WITH T (id, col) AS (
SELECT 1, 'GEORGE & SON ' FROM DUAL UNION ALL
SELECT 2, '-GEORGE & SON' FROM DUAL UNION ALL
SELECT 3, '&GEORGE & SON' FROM DUAL UNION ALL
SELECT 4, 'GEORGE & SON..' FROM DUAL UNION ALL
SELECT 5, 'GEORGE & SON.' FROM DUAL UNION ALL
SELECT 6, '-GEORGE & SON S.A.' FROM DUAL UNION ALL
SELECT 7, 'GEORGE & SON!' FROM DUAL UNION ALL
SELECT 8, 'GEORGE & SON"' FROM DUAL UNION ALL
SELECT 9, 'GEORGE & SON#' FROM DUAL UNION ALL
SELECT 10, 'GEORGE & SON$' FROM DUAL UNION ALL
SELECT 11, 'GEORGE & SON%' FROM DUAL UNION ALL
SELECT 12, 'GEORGE & SON&' FROM DUAL UNION ALL
SELECT 13, 'GEORGE & SON(' FROM DUAL UNION ALL
SELECT 14, 'GEORGE & SON)' FROM DUAL UNION ALL
SELECT 15, 'GEORGE & SON*' FROM DUAL UNION ALL
SELECT 16, 'GEORGE & SON+' FROM DUAL UNION ALL
SELECT 17, 'GEORGE & SON,' FROM DUAL UNION ALL
SELECT 18, 'GEORGE & SON\' FROM DUAL UNION ALL
SELECT 19, 'GEORGE & SON-' FROM DUAL UNION ALL
SELECT 20, 'GEORGE & SON\' FROM DUAL UNION ALL
SELECT 21, 'GEORGE & SON/' FROM DUAL UNION ALL
SELECT 22, 'GEORGE & SON:' FROM DUAL UNION ALL
SELECT 23, 'GEORGE & SON;' FROM DUAL UNION ALL
SELECT 24, 'GEORGE & SON<' FROM DUAL UNION ALL
SELECT 25, 'GEORGE & SON=' FROM DUAL UNION ALL
SELECT 26, 'GEORGE & SON>' FROM DUAL UNION ALL
SELECT 27, 'GEORGE & SON?' FROM DUAL UNION ALL
SELECT 28, 'GEORGE & SON@' FROM DUAL UNION ALL
SELECT 29, 'GEORGE & SON[' FROM DUAL UNION ALL
SELECT 30, 'GEORGE & SON^' FROM DUAL UNION ALL
SELECT 31, 'GEORGE & SON_' FROM DUAL UNION ALL
SELECT 32, 'GEORGE & SON`' FROM DUAL UNION ALL
SELECT 33, 'GEORGE & SON{' FROM DUAL UNION ALL
SELECT 34, 'GEORGE & SON|' FROM DUAL UNION ALL
SELECT 35, 'GEORGE & SON}' FROM DUAL UNION ALL
SELECT 36, 'GEORGE & SON~' FROM DUAL UNION ALL
SELECT 37, 'GEORGE & SON''' FROM DUAL UNION ALL
SELECT 38, 'GEORGE & SON]' FROM DUAL)
SELECT
* FROM T
WHERE REGEXP_LIKE(col, '(^[[:punct:]]|[-!"#$%&()*+,\/:;<=>?@[^_`{|}~' || '''' || ']$)|]$|\.\.$|(^[[:space:]]|[[:space:]]$)')
ORDER BY id
;
在特殊字符集中添加一个可选点;来自
'[-!"#$%&()*+,\/:;<=>?@[^_`{|}~' || '''' || ']$'
到
'[-!"#$%&()*+,\/:;<=>?@[^_`{|}~' || '''' || ']\.?$'
与
相同WITH T (id, col) AS (
SELECT 40, 'GEORGE & SON^.'FROM DUAL UNION ALL
SELECT 41, 'GEORGE & SON_.'FROM DUAL UNION ALL
SELECT 42, 'GEORGE & SON`.'FROM DUAL UNION ALL
SELECT 43, 'GEORGE & SON{.'FROM DUAL UNION ALL
SELECT 44, 'GEORGE & SON|.'FROM DUAL UNION ALL
SELECT 45, 'GEORGE & SON}.'FROM DUAL UNION ALL
SELECT 46, 'GEORGE & SON~.'FROM DUAL UNION ALL
SELECT 47, 'GEORGE & SON''.'FROM DUAL UNION ALL
SELECT 48, 'GEORGE & SON].'FROM DUAL)
SELECT
* FROM T
WHERE REGEXP_LIKE(col, '([-!"#$%&()*+,\/:;<=>?@[^_`{|}~' || '''' || ']\.?$)|]\.?$')
ORDER BY id
;
最初,只要求发生前导事件和尾随事件...;-)
两个或更多空格/标点符号的序列被
捕获[[:space:][:punct:]]{2,}
如果您想在字符串中明确使用此字符,请-用单词字符将其包围:
\w[[:space:][:punct:]]{2,}\w
当找到单个空格时,前导/后继连续空格已经匹配-无需显式担心它们。
给出:
WITH T (id, col) AS (
SELECT 50, 'GEORGE & SON ' FROM DUAL UNION ALL
SELECT 51, 'GEORGE & SON ' FROM DUAL UNION ALL
SELECT 52, ' GEORGE & SON' FROM DUAL UNION ALL
SELECT 53, ' GEORGE & SON' FROM DUAL UNION ALL
SELECT 54, 'GEORGE & SON' FROM DUAL UNION ALL
SELECT 55, 'GEORGE & SON S.A.' FROM DUAL UNION ALL
SELECT 56, 'GEORGE & SON S.A.' FROM DUAL UNION ALL
SELECT 60, ' GEORGE and SON' FROM DUAL UNION ALL
SELECT 61, ' ,GEORGE and SON' FROM DUAL UNION ALL
SELECT 62, ', GEORGE and SON' FROM DUAL UNION ALL
SELECT 63, 'GEORGE -- SON' FROM DUAL UNION ALL
SELECT 64, 'GEORGE --SON' FROM DUAL UNION ALL
SELECT 65, 'GEORGE & SON' FROM DUAL UNION ALL
SELECT 66, 'GEORGE + SON' FROM DUAL UNION ALL
SELECT 67, 'GEORGE and , SON' FROM DUAL UNION ALL
SELECT 68, 'GEORGE and , SON' FROM DUAL UNION ALL
SELECT 69, 'GEORGE and SON ,' FROM DUAL UNION ALL
SELECT 70, 'GEORGE and SON. ' FROM DUAL UNION ALL
SELECT 71, 'GEORGE and+-SON' FROM DUAL)
SELECT
* FROM T
-- WHERE REGEXP_LIKE(col, '(^[[:punct:]]|[-!"#$%&()*+,\/:;<=>?@[^_`{|}~' || '''' || ']\.?$)|]$|\.\.$|(^[[:space:]]|[[:space:]]$)|[[:space:][:punct:]]{2,}')
WHERE REGEXP_LIKE(col, '(^[[:punct:]]|[-!"#$%&()*+,\/:;<=>?@[^_`{|}~' || '''' || ']\.?$)|]$|\.\.$|(^[[:space:]]|[[:space:]]$)|\w[[:space:][:punct:]]{2,}\w')
ORDER BY id
;
但是会产生误报,最明显的是 GEORGE&SON 。在某种程度上,可以通过用较少包含的集合替换 [:punct:] 来避免这种情况。 (最终)选择将取决于是否更关注假阴性或假阳性。
查看实际情况:
如前所述,误报需要与误报相平衡。一种或另一种方式。 但是,这可能是考虑将整体问题分解为较小的问题并单独处理的好时机。即使 GEORGE和P. SON 是完全可以接受的,您也可能想要查看例如 -GEORGE和P. SON 。因此,让我们集中讨论字符串中间的流浪字符序列-甚至还记得以前的**和**,并允许枚举(并因此使用逗号):
WHERE
REGEXP_LIKE(col, '\w[[:space:][:punct:]]{2,}\w')
AND
NOT REGEXP_LIKE(col, ' [[:upper:]]\. \w')
AND
NOT INSTR(col, ', ') > 0
AND
NOT INSTR(col, ' & ') > 0
可能后跟
WHERE
REGEXP_LIKE(col, '\w[[:space:][:punct:]]{2,}\w')
AND
(REGEXP_LIKE(col, ' [[:upper:]]\. \w')
OR
INSTR(col, ', ') > 0
OR
INSTR(col, ' & ') > 0
)
,以便在许多有效的字符之间找到例如 GEORGE和SON 。 INSTR
可能比REGEX快-取决于整体情况……
(i) [[:punct:] [:space:]] 本质上是将 [[:punct:]] 和 [[:space: ]] 。就从该类中选择而言,顺序无关紧要。
(ii)
[-!"#$%&()*+,\/:;<=>?@[^_`{|}~' || '''' || ']
是
[-!"#$%&()*+,\/:;<=>?@[^_`{|}~]
添加单引号。如果直接尝试这样做,Oracle将考虑使用单引号将参数值结尾。并且用反斜杠转义单引号是行不通的...因此,基本上,这就是“单独添加单引号”上方的内容。
如果且因为这需要调整/进一步的细节,请发表评论。