PCRE正则表达式 - 将所有内容与未用方括号括起的第一个管道匹配

时间:2017-07-18 11:19:06

标签: regex pcre

我有以下文本行,我试图将所有内容提取到第一个未包含在方括号中的竖线字符。

action=search sourcetype=audittrail [ localop | stats count | eval search_id = replace("$top10_drilldown_sid$", "^remote_[^_]*_", "") | table search_id ] [ localop | stats count | eval earliest = $top10_drilldown_earliest$ - 86400 | table earliest ] latest="$top10_drilldown_latest$" | stats values(savedsearch_name) AS search_name

预期产出:

action=search sourcetype=audittrail [ localop | stats count | eval search_id = replace("$top10_drilldown_sid$", "^remote_[^_]*_", "") | table search_id ] [ localop | stats count | eval earliest = $top10_drilldown_earliest$ - 86400 | table earliest ] latest="$top10_drilldown_latest$"

即。除了尾随| stats values(savedsearch_name) AS search_name

之外的所有内容

根据一些外观示例,我可以(几乎)使用JavaScript Regex表达式

获得我需要的内容

/.*\|(?![^\[]*\])/g - http://refiddle.com/refiddles/596dec4c75622d608f290000

但这并没有很好地转化为兼容PCRE的表达式(加上我希望捕获所有内容,但不包括第一个管道)。

从我读过的内容来看,第一个括号中的嵌套方括号可能是一个无法解决的并发症?任何给定集合中只有一级嵌套括号(例如[..[]..][..[]..[]..]

我承认我并不认为我的头脑充满积极和积极的态度。负面的看法,但任何帮助将不胜感激!

1 个答案:

答案 0 :(得分:0)

在这种情况下,匹配所有不是分隔符的方法比尝试分割更有效:

(?=[^|])[^][|]*(?:(\[[^][]*+(?:(?1)[^][]*)*+])[^][|]*)*

demo

细节:

(?=[^|]) # lookahead: ensure there's at least one non pipe character at the
         # current position, the goal is to avoid empty match.
[^][|]* # all that isn't a bracket or a pipe
(?:
    (  # open the capture group 1: describe a bracket part
        \[
         [^][]*+ # all that isn't a bracket (note that you don't have to care
                 # about of the pipe here, you are between brackets)
         (?:
             (?1)  # refer to the capture group 1 subpattern (it's a recursion
                   # since this reference is in the capture group 1 itself)
             [^][]* 
         )*+
         ]
    ) # close the capture group 1
    [^][|]*
)*

如果您还需要空白部件,可以像这样重写:

(?=[^|])[^][|]*(?:(\[[^][]*+(?:(?1)[^][]*)*+])[^][|]*)*|(?<=\|)