如何从正则表达式子模式中排除一组替代项?

时间:2018-09-28 18:12:12

标签: regex delphi

这个问题与我之前的问题非常相似: How to exclude a word from regex subpattern?

但是,这是不一样的,因为上一个问题仅是一个单词,而这个问题是指向我想排除的更多单词(替代词)。

第一组: 我要从匹配项中排除的单词列表: (thy|your|her|his|its|our|their|mine|yours|hers|ours|theirs|my|a|an|the) 那是代词所有格和冠词的清单。

第二组: 与第二组匹配的单词列表:

(bore|bade|bit|blew|chose|dove|drew|drove|drank|ate|fell|forbade|forgot|forgave|forsook|froze|got|gave|went|grew|hid|knew|lay|lit|mistook|overdid|overtook|overthrew|rode|rang|rose|saw|shook|shore|shrank|sang|sank|smote|spoke|stole|stank|strod|strove|swore|swam|took|threw|trod|woke|wore|wove|wrote)

示例文本

1) And Ôhe spoke to him
2) and spoke to his sons
3) his host, spoke to
4) and took of every
5) and * took a garment
6) And * took * his son
7) merchants fetched a drove of horses
8) ÔI am a rose
9) blossom like a rose
10) But a † rose out
11) that * rose up
12) and a bit
13) and Ôthey bit the people

预期的正面匹配:

1)Ô他对他说话 2)和他的儿子们说话 3)他的主人与 4)并采取了每 5)和*穿了一件衣服 6)*带走了*他的儿子 11)*上升 13)和“他们咬人了

要跳过:

7) merchants fetched a drove of horses
8) ÔI am a rose
9) blossom like a rose
10) But a † rose out
12) and a bit

这意味着,任何具有质点(a,an,the)的单词都应跳过...并且我不想捕获以下单词,因为它不是动词。另外,如果存在像yours这样的代词所有格,那么它就不是动词,因此一定不能被捕获。

我尝试过的当前模式如下:

'(*UCP)\W\K(?|(?=(your|her|his|its|our|their|mine|yours|hers|ours|theirs|my|a|an|the)\b)()|(\w+)\b)\W\b(bore|bade|bit|blew|chose|dove|drew|drove|drank|ate|fell|forbade|forgot|forgave|forsook|froze|got|gave|went|grew|hid|knew|lay|lit|mistook|overdid|overtook|overthrew|rode|rang|rose|saw|shook|shore|shrank|sang|sank|smote|spoke|stole|stank|strod|strove|swore|swam|took|threw|trod|woke|wore|wove|wrote)\b(?=\W)'

或者我尝试将(\w+)更改为(\w+|\*)

https://regex101.com/r/d6YZYA/10

请注意:

星号*表示名词。这就是为什么我需要从5)和6)中捕获单词took

5) and * took a garment
6) And * took * his son

符号†是天竺葵,不是一个。

当前结果不正确。我看到所有的冠词和代词都被捕获,因此不是动词的单词会被错误地识别。

2 个答案:

答案 0 :(得分:2)

您可以使用

(*UCP)(?<!\w)(?!(?:your?|hers?|his|its|ours?|theirs?|mine|my|an?|the)\b)(\w+|[*†]),?\s+(bore|bade|bit|blew|chose|dove|drew|drove|drank|ate|fell|forbade|forgot|forgave|forsook|froze|got|gave|went|grew|hid|knew|lay|lit|mistook|overdid|overtook|overthrew|rode|rang|rose|saw|shook|shore|shrank|san[gk]|smote|spoke|stole|stank|strove|swore|swam|took|threw|s?trod|wo[krv]e|wrote)\b

请参见regex demo

注意:

  • 第一个\b被明确的单词边界替换为(?<!\w),因为*和gerundium符号是非单词字符,并且\b之前需要一个单词char立即出现在他们的左边
  • 在句子3的host之后有一个逗号,因此,我在第一个捕获组之后添加了一个可选的,?
  • 条件式无法排除此处的单词,限制性否定前瞻禁止第一个捕获组匹配否定前瞻中列出的单词列表

模式详细信息

  • (*UCP)-所有速记类现在都支持Unicode
  • (?<!\w)-当前位置左侧不允许有字符char
  • (?!(?:your?|hers?|his|its|ours?|theirs?|mine|my|an?|the)\b)-如果在当前位置的右侧紧跟列出的单词之一(hisher,{{1 }}等)
  • hers-第1组:一个或多个单词字符或(\w+|[*†])*
  • -可选的,?\s+,然后是1+个空格
  • ,-第2组:该组内的任何单词(模式)
  • (bore|bade|bit|blew|chose|dove|drew|drove|drank|ate|fell|forbade|forgot|forgave|forsook|froze|got|gave|went|grew|hid|knew|lay|lit|mistook|overdid|overtook|overthrew|rode|rang|rose|saw|shook|shore|shrank|san[gk]|smote|spoke|stole|stank|strove|swore|swam|took|threw|s?trod|wo[krv]e|wrote)-一个单词边界(以上所有单词都以单词char结尾,因此\b就足够了。)

答案 1 :(得分:0)

现在我找到了一篇很棒的文章,它解释了在其他情况下如何使用。

https://regular-expressions.mobi/conditional.html?wlr=1

这将详细解释if条件的使用。

所以基本语法是:

(?(?=regex)then|else)

使用交替的语法是:

(?(?=condition)(then1|then2|then3)|(else1|else2|else3))

那真的是有用的东西!