常用表达

时间:2018-09-19 12:43:44

标签: regex oracle

我有一个带有文本列的表格,我需要从中选择在同一句子中将相同单词重复3次或更多次的行,其中I-和I是不同的单词。 以下是我所做的,但无法正常工作。 一个字母是一个单词,但可以有很多字母。 句子的结尾是符号(点,!,?)

select *
from text
where regexp_like(text,
 q'~([^[:alpha:]-]|^)
([[:alpha:]]{2,}(-[[:alpha:]]{2,})?|-[[:alpha:]]{2,}|[[:alpha:]]{2,}-)
[^[:alpha:]-]((.*?[^[:alpha:]-])?\2([^[:alpha:]-]|$)){2,}~','ix');    

示例文字:

-bad girls, -bad boys,-bad phone. phone phone mam: phone phone? 
wup, wup, BORAK OBAMA OBAMA MAMA; it is OBAMA . 
hustone, we have a problem, big problem. Very big, big, big
high cost - high perfomance, high 
full-hd,tv-full,full-hd:full-hd
Fooo fooo fooo , fooo-- fooo-- 
fooo feee faaa , fooo-fooo, fooo-fooo.
 a a a 
A. a a 

1 个答案:

答案 0 :(得分:2)

您可以使用正则表达式搜索:

  • (^|[^[:alpha:]-])字符串或非单词字符的开头;
  • ([[:alpha:]-]+)然后是一个由您的单词字符组成的单词;
  • (
    • [^[:alpha:]-.!?]后跟非句子结尾的非单词字符;
    • ([^.!?]*[^[:alpha:]-.!?])?然后,可选地,任意数量的非语句结束字符后跟一个非语句结束非单词字符;
    • \2然后是先前匹配的单词
  • ){2}重复了两次;
  • ($|[^[:alpha:]-]),最后是字符串末尾或非单词字符。

赞:

SQL Fiddle

Oracle 11g R2架构设置

CREATE TABLE strings ( value ) AS
SELECT '-bad girls, -bad boys,-bad phone. phone phone mam: phone phone? ' FROM DUAL UNION ALL
SELECT 'wup, wup, BORAK OBAMA OBAMA MAMA; it is OBAMA . ' FROM DUAL UNION ALL
SELECT 'hustone, we have a problem, big problem. Very big, big, big' FROM DUAL UNION ALL
SELECT 'high cost - high perfomance, high ' FROM DUAL UNION ALL
SELECT 'full-hd,tv-full,full-hd:full-hd' FROM DUAL UNION ALL
SELECT 'Fooo fooo fooo , fooo-- fooo-- ' FROM DUAL UNION ALL
SELECT 'fooo feee faaa , fooo-fooo, fooo-fooo.' FROM DUAL UNION ALL
SELECT ' a a a' FROM DUAL UNION ALL 
SELECT 'A. a a' FROM DUAL;

查询1

SELECT value,
       REGEXP_SUBSTR(
         value, 
         '(^|[^[:alpha:]-])([[:alpha:]-]+)([^[:alpha:]-.!?]([^.!?]*[^[:alpha:]-.!?])?\2){2}($|[^[:alpha:]-])',
         1,
         1,
         NULL,
         2
       ) As match
FROM   strings
WHERE  REGEXP_LIKE(
         value,
         '(^|[^[:alpha:]-])([[:alpha:]-]+)([^[:alpha:]-.!?]([^.!?]*[^[:alpha:]-.!?])?\2){2}($|[^[:alpha:]-])'
       )

Results

|                                                            VALUE |   MATCH |
|------------------------------------------------------------------|---------|
| -bad girls, -bad boys,-bad phone. phone phone mam: phone phone?  |    -bad |
|                 wup, wup, BORAK OBAMA OBAMA MAMA; it is OBAMA .  |   OBAMA |
|      hustone, we have a problem, big problem. Very big, big, big |     big |
|                               high cost - high perfomance, high  |    high |
|                                  full-hd,tv-full,full-hd:full-hd | full-hd |
|                                                            a a a |       a |