逗号分隔的单词正则表达式

时间:2017-03-03 10:07:07

标签: python regex

我在尝试解析表达式时遇到了一些问题,如下所示:

word1, word2[a,b,c],   word3, ..., wordN

我想获得以下群组:

g1: word1
g2: word2[a,b,c]
g3: word3

请注意[。+]是可选的,正则表达式必须能够匹配以下表达式:

word1,word2,word3
word1[a,b,c],word2,word3
word1[a,b,c],word2[e,f,g],word3
word1[a,b,c],word2[e,f,g],word3[i,j,l]

我做了一些尝试,但我找不到正确分开小组的方法。

3 个答案:

答案 0 :(得分:1)

我在https://regex101.com上尝试了这个正则表达式,并将表达式粘贴到“测试字符串”框中。

/^([a-zA-Z0-9]+(?:\[.*\])?),([a-zA-Z0-9]+(?:\[.*\])?),([a-zA-Z0-9]+(?:\[.*\])?)$/gm

每个单词用逗号分隔,形式为:

([a-zA-Z0-9]+(?:\[.*\])?)

说明:

(
  [a-zA-Z0-9]+ # one or more alphanumeric characters (could use \w)
  (?:\[.*\])? # an optional sequence surrounded by []s. (?: ) means a non-capturing group
)

答案 1 :(得分:1)

暂时这似乎有效:

import re
rgx = re.compile("(\w+(\[.*?\])*).*?,?")
[key for key, val in rgx.findall("word1, word2[a,b,[c,,,]],     word,3")]

# this regex starts by looking for alpha numberic characters with \w+
# then within that it looks if a `[` is present then till we encounter end of bracket ']' consider everything (\[.*?\])*.
# the output of this is a tuple as ('word2[a,b,c]', '[a,b,c]')
# we iterate over the tuple and take only the 1st values in the tuple

输出:

['word1', 'word2[a,b,[c,,,]', 'word', '3']

另一个例子

[key for key, val in rgx.findall("word1[bbbb,cccc],word2[bbbb,cccc] ")]

输出:

['word1[bbbb,cccc]', 'word2[bbbb,cccc]']
PS:还在改善它。

答案 2 :(得分:1)

您可以使用jquery-3.1.1.js仅在逗号之外进行拆分,这些逗号位于括号之外。这可以通过以下事实来确定:这些逗号在开始之前永远不会与结束括号匹配(使用否定前瞻)。只有非嵌套括号才能使用此技巧。

<link rel="stylesheet" href="<?php echo base_url(); ?>bootstrap-3.3.7/dist/css/bootstrap-iso.css">
<script type="text/javascript" src="<?php echo base_url(); ?>js/jquery-3.1.1.js"></script>
<script type="text/javascript" src="<?php echo base_url(); ?>bootstrap-3.3.7/dist/js/bootstrap.min.js"></script>

输出re.split

http://ideone.com/7vIwFM