我有一个包含以下内容的数据集:
(event) (tag) [group (artist)] title (form) [addition1] [addition2]
(event) [group (artist)] title (form) [addition1]
[event] [group (artist)] title (form) (addition1)
(tag) [group (artist)] title
[group (artist)] title
title
【tag】 [group (artist)] title 【form】
[group (artist)] title
[group] title
[artist] title
(artist) title
我想从每一行获得标题。
有三种模式可以匹配标题:
1。
([\)\]】]\s*(?P<title>[^\(\)\[\]\【\】\s]*)\s*[\(\[【])
可以匹配某些行,例如*] title (*
2。
([\)\]】]\s*(?P<title>[^\(\)\[\]\【\】\s]*)
匹配*] title
3。
(?P<title>[^\(\)\[\]\【\】\s]*)
匹配行只是title
我不知道将这三个规则合并为一个正则表达式。 所以,我写了一些Python代码来做到这一点:
我正在尝试将这三个规则合并为一个。
答案 0 :(得分:2)
像
这样的东西(?:^|[])] +)(?P<title>\w+)(?: +[[【(]|$)
示例强>
>>> strings = ["(event) (tag) [group (artist)] title (form) [addition1] [addition2]", "(event) [group (artist)] title (form) [addition1]", "[event] [group (artist)] title (form) (addition1)", "(tag) [group (artist)] title", "[group (artist)] title", "title", "【tag 】 [group (artist)] title 【form】", "[group (artist)] title", "[group] title", "[artist] title", "(artist) title"]
>>> for string in strings:
... re.findall(r'(?:^|[])] +)(?P<title>\w+)(?: +[[【(]|$)', string ) ...
['title']
['title']
['title']
['title']
['title']
['title']
['title']
['title']
['title']
['title']
['title']