根据逗号数拆分字符串

时间:2016-10-07 13:25:50

标签: python regex

我有一个用逗号分割的文字。

e.g:

FOO( something, BOO(tmp, temp), something else)

可能是其他包含逗号的字符串......

我想将FOO的文本中的文本拆分为其元素,然后将元素分开。

我所知道的是 FOO 必须有两个逗号。

我怎样才能将 FOO 的内容分成三个元素?

备注: 其他可能是 BOO(ddd,ddd)或只是 ddd 。我不能假设一个简单的正则表达式'FOO \(\ w +,BOO(\ w +,\ w +),\ w + \)'

4 个答案:

答案 0 :(得分:0)

假设该字符串是Python代码,您可以使用解析器。如果你仔细观察结果,你可能会认为它并不像它最初看起来那么糟糕。

>>> from parser import *
>>> source="FOO( something, BOO(tmp, temp), something)"
>>> st=suite(source)
>>> st2tuple(st)
(257, (268, (269, (270, (271, (272, (302, (306, (307, (308, (309, (312, (313, (314, (315, (316, (317, (318, (319, (320, (1, 'FOO')), (322, (7, '('), (330, (331, (302, (306, (307, (308, (309, (312, (313, (314, (315, (316, (317, (318, (319, (320, (1, 'something')))))))))))))))), (12, ','), (331, (302, (306, (307, (308, (309, (312, (313, (314, (315, (316, (317, (318, (319, (320, (1, 'BOO')), (322, (7, '('), (330, (331, (302, (306, (307, (308, (309, (312, (313, (314, (315, (316, (317, (318, (319, (320, (1, 'tmp')))))))))))))))), (12, ','), (331, (302, (306, (307, (308, (309, (312, (313, (314, (315, (316, (317, (318, (319, (320, (1, 'temp'))))))))))))))))), (8, ')')))))))))))))))), (12, ','), (331, (302, (306, (307, (308, (309, (312, (313, (314, (315, (316, (317, (318, (319, (320, (1, 'something'))))))))))))))))), (8, ')')))))))))))))))))), (4, ''))), (4, ''), (0, ''))

答案 1 :(得分:0)

您可以使用此正则表达式

,(?=(?:(?:\([^)]*\))?[^)]*)+\)$)

将你的字符串拆分为coma,bot不在BOO(...)

sample

答案 2 :(得分:0)

你可以使用支持递归的http://www.w3schools.com/jsref/prop_frame_contentwindow.asp(对于处理嵌套结构很有用):

import regex

s = 'FOO( something, BOO(tmp, temp), something else)'

pat = regex.compile(r'''(?(DEFINE) # inside a definition group
    # you can define subpatterns to use later
    (?P<elt>     # define the subpattern "elt"
        [^,()]*+
        (?:
            \( (?&elt) (?: , (?&elt) )* \)
            [^,()]*
        )*+
    )
)
# start of the main pattern
FOO\( \s*
    (?P<elt1> (?&elt) )  # capture group "elt1" contains the subpattern "elt"
    , \s*
    (?P<elt2> (?&elt) )  # same here
    , \s*
    (?P<elt3> (?&elt) )  # etc.
\)''', regex.VERSION1 | regex.VERBOSE )

m = pat.search(s)

print(m.group('elt1'))
print(m.group('elt2'))
print(m.group('elt3'))

regex module

答案 3 :(得分:0)

假设您需要FOO内的元素列表,请先对其进行预处理

>>> s = 'FOO( something, BOO(tmp, temp), something else)'
>>> s
'FOO( something, BOO(tmp, temp), something else)'
>>> s = re.sub(r'^[^(]+\(|\)\s*$','',s)
>>> s
' something, BOO(tmp, temp), something else'

使用regex模块:

>>> regex.split(r'[^,(]+\([^)]+\)(*SKIP)(?!)|,', s)
[' something', ' BOO(tmp, temp)', ' something else']
  • [^,(]+\([^)]+\)(*SKIP)(?!)跳过模式[^,(]+\([^)]+\)
  • |,替代模式实际拆分输入字符串,在这种情况下是,


另一个例子:

>>> t = 'd(s,sad,e),g(3,2),c(d)'
>>> regex.split(r'[^,(]+\([^)]+\)(*SKIP)(?!)|,', t)
['d(s,sad,e)', 'g(3,2)', 'c(d)']