正则表达式正确处理嵌套模式

时间:2014-09-30 12:04:16

标签: regex access-vba

我正试图找到处理我遇到的问题的最佳方法。我需要能够从字符串中提取注释,字符串被模式化为字符串末尾括号之间的内容。评论可以是单个评论,多个评论,嵌套评论或这些评论的组合。

一些例子:

this is a string (with comment)
this is another string (with comment)(and more comment)
this is yet another string (with comment (and some nested comment)

这是最简单的格式,使用以下正则表达式(访问VBA)相当容易分离

regex.Pattern = "^([^(]*)(\(.*\))+$"

我得到以下正确的输出,其中group1是值,group2是注释

group1: this is a string / group2: (with comment)
group1: this is another string / group2:  (with comment)(and more comment)
group1: this is yet another string / group2:  (with comment (and some nested comment)

问题是在某些情况下我有数组,这些应该失败。数组可以用逗号或斜杠定义。非常简单,但问题是这些令牌也可用于其他目的。因此,如果在字符串中找到逗号或斜杠,则它被视为数组,除非:

- the token is within the comment
- the slash is part of a fractional number

一些例子:

this is string1 with a fractional 1/4 number (with comment)
this is string1 (with a fractional 1/4 in comment)
this is string1 (with comment1) / this is string2 (with comment2)
this is string1 (with some data, seperated by a comma) , this is string2 (with comment3 / comment4)
this is string1 (with a fractional 1/4) / this is string2 (with comment2,comment3)

添加了示例:第一个应该失败,因为它包含一个数组标记(斜杠),它不是小数的一部分。第二个选择太多,因为它只应该从最后一个注释而不是从第一个到第二个注释的整个字符串。

this is string1 without comment / this is string2 (with comment2)
This is a  string (with subcomment) where only the last should be selected (so this one)

除非逗号或斜杠是异常的一部分,否则如何最好地调整逻辑以使重复失败?我最终得到了monstercode,所以想看看是否有更简单的选择。因此,上述异常最终应如下:

ex1 / group1 : this is string1 with a fractional 1/4 number group2: (with comment)
ex2 / group1 : this is string1 group2 : (with a fractional 1/4 in comment)
ex3 to 5 should fail as they are considered arrays and need some additional logic

希望它有点清楚......

1 个答案:

答案 0 :(得分:1)

我想你想要这样的东西,

^((?:(?!\)\s*[,\/]).)*?)(\([^()]*\))$

DEMO

<强>更新

^(?=(?:(?!\)\s*[,\/]|\s\/\s).)*$)(.*?)((?:\([^()\n]*\))+)$

DEMO