Question

使用正则表达式，我需要收集有效顶级令牌{...}，并忽略引用字符串"..."内的令牌边界（包括可能） ""）。

简化样本：

TEXT{bbbbb}TEXT{cccc|{dddd}}TEXT{eeee|ff{gg}hh|ii{jj}"kk}{|{}ll""mm{nn}"oo|{pppp}}TEXT

预计有3场比赛：

{bbbbb}
{cccc|{dddd}}
{eeee|ff{gg}hh|ii{jj}"kk}{|{}ll""mm{nn}"oo|{pppp}}
请注意，忽略字符串"kk}{|{}ll""mm{nn}"的内容

每个令牌都应遵循以下语法：

'{'<tokenName>[a1]['|'[a2|c1]('='|'<>')c2]['|'c3['|'c4]]'}'

其中aX是简单的正则表达式（(,-?\d+(:.*)?|:.*)），而cX可以包含匹配的{ - }，纯文本和字符串"..." {，}，|，""等特殊字符被视为纯文字。

我不明白这个正则表达式中需要的平衡和转义。对于相对初学者来说，这可能是一项艰巨的任务。

其他详细信息：

我已经完成了解决方案的一部分，我的问题是平衡和引用。

我正在尝试创建input As String中包含的类似于format strings的扩展令牌，但也允许条件评估。例如，在文件名模板中使用这些令牌，用户可以在批处理中配置文件名的自定义部分。令牌具有字母数字名称和格式部分（包括可选条件，true部分和false部分）。格式化部分可以在：

“原生格式” - 从String.Format()已知，但字母数字名称而不是占位符{0}，{1}，{2}，...
正则表达式是(,-?\d+(:.*)?|:.*)
。
例子：
,3（已转换为标准{0,3}）
:d（已转换为标准{0:d}）
-3:d（已转换为标准{0,-3:d}）
“复杂格式” - 以原生格式，
或者使用固定文字{0}作为字符串...{0<native format>}...的一部分进行语境化正则表达式是："("＆amp; nativeFormatSpec＆amp; "|.*({0.*}.*)*)"
。
例子：
-4:d（仅限原生格式）
prefix {0,-4:d} & once more {0} suffix已翻译为
。。。。 String.Format("prefix {0,-4:d} "&}} once" more {0} suffix", value)

现在有效令牌的整个语法是：

'{'<tokenName>[quickFormat]['|'[complexFmt1]('='|'<>')value][complexFmt2|[complexFmt3]]'}'

我使用的代码使用了很多命名组，但正则表达式可能过于简单：

'*** matching token names (for later use as match.Group(groupName))
Const tokenGroup As String = NameOf(tokenGroup)
Const compareFormatGroup As String = NameOf(compareFormatGroup)
Const quickFormatGroup As String = NameOf(quickFormatGroup)
Const compareOperatorGroup As String = NameOf(compareOperatorGroup)
Const compareValueGroup As String = NameOf(compareValueGroup)
Const defaultFormatGroup As String = NameOf(defaultFormatGroup)
Const elseFormatGroup As String = NameOf(elseFormatGroup)

'*** subpatterns
Const nativeFormatSpec As String = "(,-?\d+(:.*)?|:.*)"
Const complexFormatSpec As String = "(" & nativeFormatSpec & "|.*({0.*}.*)*)" 'value allowing one token {0} multiple times

Dim matches As MatchCollection = Regex.Matches(input,
        $"\{{(?<{tokenGroup}>{Regex.Escape(token)})(?<{quickFormatGroup}>{nativeFormatSpec}?)" &
        $"((\|(?<{compareFormatGroup}>{complexFormatSpec}))?(?<{compareOperatorGroup}>=|!=|<>)(?<{compareValueGroup}>.*))?" &
        $"(\|(?<{defaultFormatGroup}>{complexFormatSpec})(\|(?<{elseFormatGroup}>{complexFormatSpec}))?)?\}}")

Answer 1

根据您的简化数据，以下是应该处理提取的正则表达式：

\{(?>(?:"[^"]*(?:""[^"]*)*"|[^{}]+)|\{(?<n>)|\}(?<-n>))*(?(n)(?!))\}

请参阅demo

这基本上是一个平衡的大括号正则表达式，加上类似VB.NET的字符串文字匹配正则表达式"[^"]*(?:""[^"]*)*"。请注意，在查找成对的大括号（引用的字符串和非大括号）时，实际上会忽略此部分 - (?:"[^"]*(?:""[^"]*)*"|[^{}]+) - 。

使用.NET正则表达式查找所有特殊标记

其他详细信息：

1 个答案: