这个问题与我之前的问题非常相似: How to exclude a word from regex subpattern?
但是,这是不一样的,因为上一个问题仅是一个单词,而这个问题是指向我想排除的更多单词(替代词)。
第一组:
我要从匹配项中排除的单词列表:
(thy|your|her|his|its|our|their|mine|yours|hers|ours|theirs|my|a|an|the)
那是代词所有格和冠词的清单。
第二组: 与第二组匹配的单词列表:
(bore|bade|bit|blew|chose|dove|drew|drove|drank|ate|fell|forbade|forgot|forgave|forsook|froze|got|gave|went|grew|hid|knew|lay|lit|mistook|overdid|overtook|overthrew|rode|rang|rose|saw|shook|shore|shrank|sang|sank|smote|spoke|stole|stank|strod|strove|swore|swam|took|threw|trod|woke|wore|wove|wrote)
示例文本
1) And Ôhe spoke to him
2) and spoke to his sons
3) his host, spoke to
4) and took of every
5) and * took a garment
6) And * took * his son
7) merchants fetched a drove of horses
8) ÔI am a rose
9) blossom like a rose
10) But a † rose out
11) that * rose up
12) and a bit
13) and Ôthey bit the people
预期的正面匹配:
1)Ô他对他说话 2)和他的儿子们说话 3)他的主人与 4)并采取了每 5)和*穿了一件衣服 6)*带走了*他的儿子 11)*上升 13)和“他们咬人了
要跳过:
7) merchants fetched a drove of horses
8) ÔI am a rose
9) blossom like a rose
10) But a † rose out
12) and a bit
这意味着,任何具有质点(a,an,the)的单词都应跳过...并且我不想捕获以下单词,因为它不是动词。另外,如果存在像yours
这样的代词所有格,那么它就不是动词,因此一定不能被捕获。
我尝试过的当前模式如下:
'(*UCP)\W\K(?|(?=(your|her|his|its|our|their|mine|yours|hers|ours|theirs|my|a|an|the)\b)()|(\w+)\b)\W\b(bore|bade|bit|blew|chose|dove|drew|drove|drank|ate|fell|forbade|forgot|forgave|forsook|froze|got|gave|went|grew|hid|knew|lay|lit|mistook|overdid|overtook|overthrew|rode|rang|rose|saw|shook|shore|shrank|sang|sank|smote|spoke|stole|stank|strod|strove|swore|swam|took|threw|trod|woke|wore|wove|wrote)\b(?=\W)'
或者我尝试将(\w+)
更改为(\w+|\*)
https://regex101.com/r/d6YZYA/10
请注意:
星号*表示名词。这就是为什么我需要从5)和6)中捕获单词took
。
5) and * took a garment
6) And * took * his son
符号†是天竺葵,不是一个。
当前结果不正确。我看到所有的冠词和代词都被捕获,因此不是动词的单词会被错误地识别。
答案 0 :(得分:2)
您可以使用
(*UCP)(?<!\w)(?!(?:your?|hers?|his|its|ours?|theirs?|mine|my|an?|the)\b)(\w+|[*†]),?\s+(bore|bade|bit|blew|chose|dove|drew|drove|drank|ate|fell|forbade|forgot|forgave|forsook|froze|got|gave|went|grew|hid|knew|lay|lit|mistook|overdid|overtook|overthrew|rode|rang|rose|saw|shook|shore|shrank|san[gk]|smote|spoke|stole|stank|strove|swore|swam|took|threw|s?trod|wo[krv]e|wrote)\b
请参见regex demo
注意:
\b
被明确的单词边界替换为(?<!\w)
,因为*
和gerundium符号是非单词字符,并且\b
之前需要一个单词char立即出现在他们的左边host
之后有一个逗号,因此,我在第一个捕获组之后添加了一个可选的,?
模式详细信息
(*UCP)
-所有速记类现在都支持Unicode (?<!\w)
-当前位置左侧不允许有字符char (?!(?:your?|hers?|his|its|ours?|theirs?|mine|my|an?|the)\b)
-如果在当前位置的右侧紧跟列出的单词之一(his
,her
,{{1 }}等)hers
-第1组:一个或多个单词字符或(\w+|[*†])
或*
†
-可选的,?\s+
,然后是1+个空格,
-第2组:该组内的任何单词(模式)(bore|bade|bit|blew|chose|dove|drew|drove|drank|ate|fell|forbade|forgot|forgave|forsook|froze|got|gave|went|grew|hid|knew|lay|lit|mistook|overdid|overtook|overthrew|rode|rang|rose|saw|shook|shore|shrank|san[gk]|smote|spoke|stole|stank|strove|swore|swam|took|threw|s?trod|wo[krv]e|wrote)
-一个单词边界(以上所有单词都以单词char结尾,因此\b
就足够了。)答案 1 :(得分:0)
现在我找到了一篇很棒的文章,它解释了在其他情况下如何使用。
https://regular-expressions.mobi/conditional.html?wlr=1
这将详细解释if条件的使用。
所以基本语法是:
(?(?=regex)then|else)
使用交替的语法是:
(?(?=condition)(then1|then2|then3)|(else1|else2|else3))
那真的是有用的东西!