昨天我在这个问题上发了一个类似的问题: Python Regex Named Groups。 对于简单的事情,这项工作非常好。
经过一番研究后,我读到了关于pyparsing库的问题,这对我的任务来说似乎非常完美。
text = '[@a eee, fff fff, ggg @b eee, fff, ggg @c eee eee, fff fff,ggg ggg@d]'
command_s = Suppress(Optional('[') + Literal('@'))
command_e = Suppress(Literal('@') | Literal(']'))
task = Word(alphas)
arguments = ZeroOrMore(
Word(alphas) +
Suppress(
Optional(Literal(',') + White()) | Optional(White() + Literal('@'))
)
)
command = Group(OneOrMore(command_s + task + arguments + command_e))
print command.parseString(text)
# which outputs only the first @a sequence
# [['a', 'eee', 'fff', 'fff', 'ggg']]
# the structure should be someting like:
[
['a', 'eee', 'fff fff', 'ggg'],
['b', 'eee', 'fff', 'ggg'],
['c', 'eee eee', 'fff fff', 'ggg ggg'],
['d']
]
@表示序列的开始,第一个单词是任务(a),后跟可选的逗号分隔的参数(eee,fff fff,ggg)。问题是,上面的代码忽略了@ b,@ c和@d。同样“fff fff”被视为两个独立的参数,它应该只有一个。
答案 0 :(得分:4)
请参阅嵌入式评论。
text = '[@a eee, fff fff, ggg @b eee, fff, ggg @c eee eee, fff fff,ggg ggg@d]'
from pyparsing import *
LBRACK,RBRACK,AT = map(Suppress,"[]@")
key = AT + Word(alphas)
# use originalTextFor to preserve whitespace between words between commas
list_item = originalTextFor(OneOrMore(Word(alphas)))
# define a key_value pair using Group to preserve structure
key_value = Group(key + Optional(delimitedList(list_item)))
parser = LBRACK + OneOrMore(key_value) + RBRACK
print parser.parseString(text)
这将打印您想要的输出。
[['a', 'eee', 'fff fff', 'ggg'],
['b', 'eee', 'fff', 'ggg'],
['c', 'eee eee', 'fff fff', 'ggg ggg'],
['d']]
对于额外的功劳,以下是如何让pyparsing为您定义键:
# Extra credit:
# use Dict to auto-define named groups using each '@x' as a key
parser = LBRACK + Dict(OneOrMore(key_value)) + RBRACK
result = parser.parseString(text)
# print the parsed keys
print result.keys()
# print a value for a particular key
print result['c']
# print a value for a particular key using object notation
print result.b
# dump out the whole structure to see just what we got
print result.dump()
打印
['a', 'c', 'b', 'd']
['eee eee', 'fff fff', 'ggg ggg']
['eee', 'fff', 'ggg']
[['a', 'eee', 'fff fff', 'ggg'], ['b', 'eee', 'fff', 'ggg'], ['c', 'eee eee', 'fff fff', 'ggg ggg'], ['d']]
- a: ['eee', 'fff fff', 'ggg']
- b: ['eee', 'fff', 'ggg']
- c: ['eee eee', 'fff fff', 'ggg ggg']
- d: