我正在Oracle中修复一些文本。问题是我的数据中的句子有句子,句子不用空格分隔。例如:
没有空格的句子。句子之间
带问号的句子?第二句
我已经在regex101中对以下替换语句进行了测试,似乎可以解决这个问题,但我无法确定它在甲骨文中无效的原因:
regexp_replace(review_text, '([^\s\.])([\.!\?]+)([^\s\.\d])', '\1\2 \3')
这应该允许我查找句子分隔期/感叹号/问号(单个或分组)并在句子之间添加必要的空格。我意识到还有其他方法可以分离句子,但我上面的内容应该涵盖绝大多数用例。第三个捕获组中的\ d是为了确保我不会意外地更改数字值,例如" 4.5"到" 4。 5"
在测试组之前:
Sentence without space.Between sentences
Sentence with space. Between sentences
Sentence with multiple periods...Between sentences
False positive sentence with 4.5 Liters
Sentence with!Exclamation point
Sentence with!Question mark
更改后应如下所示:
Sentence without space. Between sentences
Sentence with space. Between sentences
Sentence with multiple periods... Between sentences
False positive sentence with 4.5 Liters
Sentence with! Exclamation point
Sentence with! Question mark
Regex101链接:https://regex101.com/r/dC9zT8/1
虽然所有更改都按照regex101的预期工作,但我的问题是我在Oracle中遇到的是我的第三和第四个测试用例没有按预期工作。 Oracle不会在多个句点(省略号)组之后添加空格,而regexp_replace正在为" 4.5"添加空格。我不确定为什么会出现这种情况,但是我可能还缺少一些关于Oracle regexp_replace的特性。
赞赏任何和所有的见解。谢谢!
答案 0 :(得分:2)
这可能会让你入门。这将检查。?!在任何组合中,后跟零个或多个空格和一个大写字母,它将用一个空格替换“零个或多个空格”。这不会分隔十进制数;但它会错过以大写字母以外的任何内容开头的句子。您可以开始添加条件 - 如果遇到困难请回信,我们会尽力帮助您。参考其他正则表达式方言可能会有所帮助,但它可能不是获得答案的最快方式。
with
inputs ( str ) as (
select 'Sentence without space.Between sentences' from dual union all
select 'Sentence with space. Between sentences' from dual union all
select 'Sentence with multiple periods...Between sentences' from dual union all
select 'False positive sentence with 4.5 Liters' from dual union all
select 'Sentence with!Exclamation point' from dual union all
select 'Sentence with!Question mark' from dual
)
select regexp_replace(str, '([.!?]+)\s*([A-Z])', '\1 \2') as new_str
from inputs;
NEW_STR
-------------------------------------------------------
Sentence without space. Between sentences
Sentence with space. Between sentences
Sentence with multiple periods... Between sentences
False positive sentence with 4.5 Liters
Sentence with! Exclamation point
Sentence with! Question mark
6 rows selected.