Question

我想识别大文本块中的特定模式，我将使用C＃.NET正则表达式lib。

即

1. This camera support Monochrome, Neutral, Standard, Landscape and Portrait [...More words...] settings furnish advanced, personalized color control.
Output shall be: Array ["Monocrome", "Neutral", "Standard", "Landscape", "Portrait"]

它也应该避免“提前”，然后是单词。

我目前正在使用表达式(([\S]+)( {0,3})?(,|and))，它将所有单词返回给和。你可以建议我在和之后覆盖单词的表达吗？

干杯！ Nilay

Answer 1

你试过了吗？

 (([\S]+)( {0,3})?(,|and|\.))

http://regexr.com?355ci

Answer 2

使用lookaround

找到正确答案

问题：当正比较时，正则表达式光标将提前参考，即 Monochrome, Neutral, Standard, Landscape and Portrait认为and是捕获的一部分，而下一次捕获将无法使用该字，因此它不会捕获Portrait。正确的方法是向前和向后使用环视。

(?=( {0,1})?(,|and)))是正确的前瞻性前瞻，而(?<=( {1,3}(and|or) {1,3}))是向后看的正确。

Answer 3

匹配列表并不是很难，但是将它列入正确的列表更难，我怀疑我在perl中使用的机制是依赖于语言的（我不使用微软产品，所以我不会在C＃中给你。

在perl中，我会做类似以下的事情。这不是一个正则表达式的答案，但我认为代码更清楚。

$string = "This camera support Monochrome, Neutral, Standard, Landscape and Portrait foo bar baz";

$re_sep = "(?: {0,3}, {0,3}| {1,3}and {1,3})";
$re_list = "\w+(?:$re_sep\w+)+";

($list) = $string =~ m/($re_list)/;
@list_elements =  split /$re_sep/, $list;

正则表达式 - 识别句型

3 个答案: