Question

鉴于How do i extract a list of elements encased in quotation marks bounded by <> and delimited by commas - python, regex?中的解决方案，我能够捕获prefix表示的所需模式的values和CAPITALIZED.PREFIX以及尖括号内的值{{ 1}}

“”“calf_n1：= n _-_ c_le＆amp; n _-_ pn_le＆amp; \ n [ORTH.FOO＆lt;”cali.ber，kl“， 'calf'，“done”＆gt;，\ nLKEYS.KEYREL.PRED“_calf_n_1_rel”，\ n ORHT2BAR ＆lt;“什么如此＆gt;”，“这个混乱＆lt; up”＆gt; ，\ n LKEYS.KEYREL.CARG “＆lt; 20＆gt;”，\ n \ nLOOSE.SCREW“＆gt; 20但＆lt; 30”\ n JOKE＆lt;'whatthe'，“what”＆gt;，\ n THIS +]。“”“

但是我遇到了问题，我有像上面那样的字符串。所需的输出是：

< "value1" , "value2", ... >

我已尝试过以下内容，但它只给了我第一个元组，如何获得所需的输出所有可能的元组？：

('ORTH.FOO', ['cali.ber,kl','calf','done'])
('ORHT2BAR', ['what so ever >', 'this that mess < up'])
('JOKE', ['whathe ', 'what'])

Answer 1

正则表达式不支持“递归”解析。在使用正则表达式捕获组后，在<和>个字符之间处理该组。

shlex module可以很好地解析你引用的字符串：

import shlex
import re

intext = """calf_n1 := n_-_c_le & n_-_pn_le &\n [ ORTH.FOO < "cali.ber,kl", 'calf' , "done" >,\nLKEYS.KEYREL.PRED "_calf_n_1_rel",\n ORHT2BAR <"what so ever >", "this that mess < up">\n LKEYS.KEYREL.CARG "<20>",\nLOOSE.SCREW ">20 but <30" ]."""
pattern = re.compile(r'.*?([A-Z0-9\.]*) < ([^>]*) >.*', flags=re.DOTALL)
f, v = pattern.match(intext).groups()

parser = shlex.shlex(v, posix=True)
parser.whitespace += ','
names = list(parser)

print f, names

输出：

ORTH.FOO ['cali.ber,kl', 'calf', 'done']

Answer 2

嗯傻傻的我。不知何故，我没有在我的机器上测试整个字符串^^;

无论如何，我使用这个正则表达式并且它有效，你只需要在列表中找到你想要的结果，我想这是可以的。我在python中不太好，也不知道如何将这个列表转换为数组或元组：

>>> import re
>>> intext = """calf_n1 := n_-_c_le & n_-_pn_le &\n [ ORTH.FOO < "cali.ber,kl", 'calf' , "done" >,\nLKEYS.KEYREL.PRED "_calf_n_1_rel",\n ORHT2BAR <"what so ever >", "this that mess < up"> ,\n LKEYS.KEYREL.CARG "<20>",\nLOOSE.SCREW ">20 but <30"\n JOKE <'whatthe ', "what" >,\n THIS + ]."""
>>> results = re.findall('\\n .*?([A-Z0-9\.]*) < *((?:[^>\n]|>")*) *>.*?(?:\\n|$)', intext)
>>> print results
[('ORTH.FOO', '"cali.ber,kl", \'calf\', "done"'), ('ORHT2BAR', '"what so ever>", "this that mess < up"'), ('JOKE', '\'whatthe \', "what" ')]

括号表示第一级元素，单引号表示第二级元素。

递归捕获正则表达式中的模式 - Python

2 个答案: