我想使用正则表达式拆分字符串。
代表
when [python] or [html ] demo "css html" -[javascript] score:5
从我想要的这个字符串,跟随列表,
contains = ['when', 'demo']
word_press = ["css html"]
tags = ['python', 'or', 'html', '-', 'javascript']
options = [{score:5}]
"[]"
(括号)中的所有字词都是标记列表。""
之间的单词将出现在word_press列表中。:
的单词,它将出现在选项列表中。我试过这个,
((?:or\s|-)?\[.*?\])|(".*?")|([a-z]+:\d*)|(\S+)
它工作正常,但我用它与python
>>> import re
>>> s = '''[python] or [html] how to "how to" user:2525
... [demo] how to createscore:5
... when [python] or [html] demo "css html" -[javascript] score:5'''
>>> re.findall('''((?:or\s|-)?\[.*?\])|(".*?")|([a-z]+:\d*)|(\S+)''', s)
[('[python]', '', '', ''),
('or [html]', '', '', ''),
('', '', '', 'how'),
('', '', '', 'to'),
('', '"how to"', '', ''),
('', '', 'user:2525', ''),
('[demo]', '', '', ''),
('', '', '', 'how'),
('', '', '', 'to'),
('', '', 'createscore:5', ''),
('', '', '', 'when'),
('[python]', '', '', ''),
('or [html]', '', '', ''),
('', '', '', 'demo'),
('', '"css html"', '', ''),
('-[javascript]', '', '', ''),
('', '', 'score:5', '')]
它返回列表中的元组。有没有办法获取像
这样的群组group1 = ['[python]', 'or [html]', '[demo]', '[python]', 'or [html]', '-[javascript]']
...
答案 0 :(得分:1)
>>> import re
>>> s = '''[python] or [html] how to "how to" user:2525
[demo] how to createscore:5
when [python] or [html] demo "css html" -[javascript] score:5'''
以下是一个可能的正则表达式(包括内联注释),用于捕获所需的信息(请参阅演示here):
>>> pattern = r'''
(?P<tag> # define group one - tags
(?:or\s|-)? # - acceptable words/chars for preceding tags
\[.*?\]) # - tag definition - words in square brackets
|(?P<word_press>".*?") # group two - words in quotes
|(?P<options>[a-z]+:\d*) # group three - options with colons
|(?P<other>\S+) # group four - anything left over
'''
请注意,将此项与findall
一起使用会为您提供元组列表:
>>> re.findall(pattern, s, re.VERBOSE)
[('[python]', '', '', ''),
('or [html]', '', '', ''),
('', '', '', 'how'),
('', '', '', 'to'),
('', '"how to"', '', ''),
('', '', 'user:2525', ''),
('[demo]', '', '', ''),
('', '', '', 'how'),
('', '', '', 'to'),
('', '', 'createscore:5', ''),
('', '', '', 'when'),
('[python]', '', '', ''),
('or [html]', '', '', ''),
('', '', '', 'demo'),
('', '"css html"', '', ''),
('-[javascript]', '', '', ''),
('', '', 'score:5', '')]
但这是一种重新排列它的功能编程方式:
>>> from functools import partial
>>> map(partial(filter, None), zip(*re.findall(pattern, s, re.VERBOSE)))
[('[python]', 'or [html]', '[demo]', '[python]', 'or [html]', '-[javascript]'),
('"how to"', '"css html"'),
('user:2525', 'createscore:5', 'score:5'),
('how', 'to', 'how', 'to', 'when', 'demo')]