Question

我在尝试解析表达式时遇到了一些问题，如下所示：

word1, word2[a,b,c],   word3, ..., wordN

我想获得以下群组：

g1: word1
g2: word2[a,b,c]
g3: word3

请注意[。+]是可选的，正则表达式必须能够匹配以下表达式：

word1,word2,word3
word1[a,b,c],word2,word3
word1[a,b,c],word2[e,f,g],word3
word1[a,b,c],word2[e,f,g],word3[i,j,l]

我做了一些尝试，但我找不到正确分开小组的方法。

Answer 1

我在https://regex101.com上尝试了这个正则表达式，并将表达式粘贴到“测试字符串”框中。

/^([a-zA-Z0-9]+(?:\[.*\])?),([a-zA-Z0-9]+(?:\[.*\])?),([a-zA-Z0-9]+(?:\[.*\])?)$/gm

每个单词用逗号分隔，形式为：

([a-zA-Z0-9]+(?:\[.*\])?)

说明：

(
  [a-zA-Z0-9]+ # one or more alphanumeric characters (could use \w)
  (?:\[.*\])? # an optional sequence surrounded by []s. (?: ) means a non-capturing group
)

Answer 2

暂时这似乎有效：

import re
rgx = re.compile("(\w+(\[.*?\])*).*?,?")
[key for key, val in rgx.findall("word1, word2[a,b,[c,,,]],     word,3")]

# this regex starts by looking for alpha numberic characters with \w+
# then within that it looks if a `[` is present then till we encounter end of bracket ']' consider everything (\[.*?\])*.
# the output of this is a tuple as ('word2[a,b,c]', '[a,b,c]')
# we iterate over the tuple and take only the 1st values in the tuple

输出：

['word1', 'word2[a,b,[c,,,]', 'word', '3']

另一个例子

[key for key, val in rgx.findall("word1[bbbb,cccc],word2[bbbb,cccc] ")]

输出：

['word1[bbbb,cccc]', 'word2[bbbb,cccc]']

PS：还在改善它。

Answer 3

您可以使用jquery-3.1.1.js仅在逗号之外进行拆分，这些逗号位于括号之外。这可以通过以下事实来确定：这些逗号在开始之前永远不会与结束括号匹配（使用否定前瞻）。只有非嵌套括号才能使用此技巧。

<link rel="stylesheet" href="<?php echo base_url(); ?>bootstrap-3.3.7/dist/css/bootstrap-iso.css">
<script type="text/javascript" src="<?php echo base_url(); ?>js/jquery-3.1.1.js"></script>
<script type="text/javascript" src="<?php echo base_url(); ?>bootstrap-3.3.7/dist/js/bootstrap.min.js"></script>

输出re.split

http://ideone.com/7vIwFM

逗号分隔的单词正则表达式

3 个答案: