Question

我有这个简单的正则表达式：

RegEx_Seek_1 := TDIPerlRegEx.Create{$IFNDEF DI_No_RegEx_Component}(nil){$ENDIF};
s1 := '(doesn''t|don''t|can''t|cannot|shouldn''t|wouldn''t|couldn''t|havn''t|hadn't)';
// s1 contents this text: (doesn't|don't|can't|cannot|shouldn't|wouldn't|couldn't|havn't|hadn't)
RegEx_Seek_1.MatchPattern := '(*UCP)(?m)'+s1+' (a |the )(ear|law also|multitude|son)(?(?= of)( \* | \w+ )| )([^»Ô¶ ][^ »Ô¶]\w*)';

目标是通过文章查找名词，其后可以跟of。如果有of，那么我需要搜索名词\w+（也搜索\*；动词的子性）。最后一个单词应该是动词。

示例文本：

. some text . Doesn't the ear try ...
. some text doesn't the law also say ...
. some text doesn't the son bear ...
. some text . Shouldn't the multitude of words be answered? ...
. some text . Why doesn't the son of * come to eat ...

我的结果：

Doesn't the ear try
doesn't the law also say
doesn't the son bear
Shouldn't the multitude of words

它没有得到最后一句话： doesn't the son of * come

我的计划是在最后一个单词之前加\ K以获得动词。

排除字符：之所以做出[^»Ô¶]是因为»，Ô，¶已经在文本中表示了一些标记，以描述现有的动词。它们可能存在也可能不存在。我正在使用空格。制表符是分隔符，不属于任何句子。

在此正则表达式中，我添加了一个空格[^»Ô¶ ]以获取最后一个单词。

所以问题是如何纠正正则表达式以获得另外一行： doesn't the son of * come

编辑：

在替换时，我需要引用同一组中的动词（我将引用动词）。

Answer 1

您的错误是在(?(?= of)( \* | \w+ )| )中。

请记住，前行不会将光标向前移动，因此( \* | \w+ )将与of 匹配，因此其余的* come现在不能与{{1}匹配}作为第二个字符是空格。

我想您应该匹配处于您状况的，例如([^»Ô¶ ][^ »Ô¶]\w*)

Answer 2

我修改了Wiktor的模式以匹配：

(*UCP)(?m)'+s1+' (a |the )(ear|law also|multitude|son)(?:\s+of Words|\s+of \*)*\s+\K(?P<verb>[^\s»Ô¶]+)

现在我可以像这样参考最后一组：

char(182)+'$<verb>'

我展示了我的结果如何使用TDIRegEx的Replace2函数更改动词。您会看到它有效：

为什么* ¶的儿子不来吃

不是耳朵¶尝试单词

为什么儿子¶熊

法律不是也¶说同一件事吗？

是否应该回答¶的众多单词？

两个答案，维克托的一个和塞巴斯蒂安的一个，都帮助我解决了这个问题。谢谢。

正则表达式与最后一个字不符

2 个答案: