我有一个带有文本列的表格,我需要从中选择在同一句子中将相同单词重复3次或更多次的行,其中I-和I是不同的单词。 以下是我所做的,但无法正常工作。 一个字母是一个单词,但可以有很多字母。 句子的结尾是符号(点,!,?)
select *
from text
where regexp_like(text,
q'~([^[:alpha:]-]|^)
([[:alpha:]]{2,}(-[[:alpha:]]{2,})?|-[[:alpha:]]{2,}|[[:alpha:]]{2,}-)
[^[:alpha:]-]((.*?[^[:alpha:]-])?\2([^[:alpha:]-]|$)){2,}~','ix');
示例文字:
-bad girls, -bad boys,-bad phone. phone phone mam: phone phone?
wup, wup, BORAK OBAMA OBAMA MAMA; it is OBAMA .
hustone, we have a problem, big problem. Very big, big, big
high cost - high perfomance, high
full-hd,tv-full,full-hd:full-hd
Fooo fooo fooo , fooo-- fooo--
fooo feee faaa , fooo-fooo, fooo-fooo.
a a a
A. a a
答案 0 :(得分:2)
您可以使用正则表达式搜索:
(^|[^[:alpha:]-])
字符串或非单词字符的开头; ([[:alpha:]-]+)
然后是一个由您的单词字符组成的单词; (
[^[:alpha:]-.!?]
后跟非句子结尾的非单词字符; ([^.!?]*[^[:alpha:]-.!?])?
然后,可选地,任意数量的非语句结束字符后跟一个非语句结束非单词字符; \2
然后是先前匹配的单词){2}
重复了两次; ($|[^[:alpha:]-])
,最后是字符串末尾或非单词字符。赞:
Oracle 11g R2架构设置:
CREATE TABLE strings ( value ) AS
SELECT '-bad girls, -bad boys,-bad phone. phone phone mam: phone phone? ' FROM DUAL UNION ALL
SELECT 'wup, wup, BORAK OBAMA OBAMA MAMA; it is OBAMA . ' FROM DUAL UNION ALL
SELECT 'hustone, we have a problem, big problem. Very big, big, big' FROM DUAL UNION ALL
SELECT 'high cost - high perfomance, high ' FROM DUAL UNION ALL
SELECT 'full-hd,tv-full,full-hd:full-hd' FROM DUAL UNION ALL
SELECT 'Fooo fooo fooo , fooo-- fooo-- ' FROM DUAL UNION ALL
SELECT 'fooo feee faaa , fooo-fooo, fooo-fooo.' FROM DUAL UNION ALL
SELECT ' a a a' FROM DUAL UNION ALL
SELECT 'A. a a' FROM DUAL;
查询1 :
SELECT value,
REGEXP_SUBSTR(
value,
'(^|[^[:alpha:]-])([[:alpha:]-]+)([^[:alpha:]-.!?]([^.!?]*[^[:alpha:]-.!?])?\2){2}($|[^[:alpha:]-])',
1,
1,
NULL,
2
) As match
FROM strings
WHERE REGEXP_LIKE(
value,
'(^|[^[:alpha:]-])([[:alpha:]-]+)([^[:alpha:]-.!?]([^.!?]*[^[:alpha:]-.!?])?\2){2}($|[^[:alpha:]-])'
)
Results :
| VALUE | MATCH |
|------------------------------------------------------------------|---------|
| -bad girls, -bad boys,-bad phone. phone phone mam: phone phone? | -bad |
| wup, wup, BORAK OBAMA OBAMA MAMA; it is OBAMA . | OBAMA |
| hustone, we have a problem, big problem. Very big, big, big | big |
| high cost - high perfomance, high | high |
| full-hd,tv-full,full-hd:full-hd | full-hd |
| a a a | a |