我有以下文本行,我试图将所有内容提取到第一个未包含在方括号中的竖线字符。
action=search sourcetype=audittrail [ localop | stats count | eval search_id = replace("$top10_drilldown_sid$", "^remote_[^_]*_", "") | table search_id ] [ localop | stats count | eval earliest = $top10_drilldown_earliest$ - 86400 | table earliest ] latest="$top10_drilldown_latest$" | stats values(savedsearch_name) AS search_name
预期产出:
action=search sourcetype=audittrail [ localop | stats count | eval search_id = replace("$top10_drilldown_sid$", "^remote_[^_]*_", "") | table search_id ] [ localop | stats count | eval earliest = $top10_drilldown_earliest$ - 86400 | table earliest ] latest="$top10_drilldown_latest$"
即。除了尾随| stats values(savedsearch_name) AS search_name
根据一些外观示例,我可以(几乎)使用JavaScript Regex表达式
获得我需要的内容 /.*\|(?![^\[]*\])/g
- http://refiddle.com/refiddles/596dec4c75622d608f290000
但这并没有很好地转化为兼容PCRE的表达式(加上我希望捕获所有内容,但不包括第一个管道)。
从我读过的内容来看,第一个括号中的嵌套方括号可能是一个无法解决的并发症?任何给定集合中只有一级嵌套括号(例如[..[]..]
或[..[]..[]..]
)
我承认我并不认为我的头脑充满积极和积极的态度。负面的看法,但任何帮助将不胜感激!
答案 0 :(得分:0)
在这种情况下,匹配所有不是分隔符的方法比尝试分割更有效:
(?=[^|])[^][|]*(?:(\[[^][]*+(?:(?1)[^][]*)*+])[^][|]*)*
细节:
(?=[^|]) # lookahead: ensure there's at least one non pipe character at the
# current position, the goal is to avoid empty match.
[^][|]* # all that isn't a bracket or a pipe
(?:
( # open the capture group 1: describe a bracket part
\[
[^][]*+ # all that isn't a bracket (note that you don't have to care
# about of the pipe here, you are between brackets)
(?:
(?1) # refer to the capture group 1 subpattern (it's a recursion
# since this reference is in the capture group 1 itself)
[^][]*
)*+
]
) # close the capture group 1
[^][|]*
)*
如果您还需要空白部件,可以像这样重写:
(?=[^|])[^][|]*(?:(\[[^][]*+(?:(?1)[^][]*)*+])[^][|]*)*|(?<=\|)