正则表达式:将一些模式合并为一个

时间:2015-09-02 05:27:37

标签: python regex extract

我有一个包含以下内容的数据集:

(event) (tag) [group (artist)] title (form) [addition1] [addition2]
(event) [group (artist)] title (form) [addition1]
[event] [group (artist)] title (form) (addition1)
(tag) [group (artist)] title
[group (artist)] title
title
【tag】 [group (artist)] title 【form】
[group (artist)] title
[group] title
[artist] title
(artist) title

我想从每一行获得标题。 有三种模式可以匹配标题:
1。 ([\)\]】]\s*(?P<title>[^\(\)\[\]\【\】\s]*)\s*[\(\[【])

可以匹配某些行,例如*] title (*

2。 ([\)\]】]\s*(?P<title>[^\(\)\[\]\【\】\s]*)
匹配*] title

之类的行

3。 (?P<title>[^\(\)\[\]\【\】\s]*)
匹配行只是title

我不知道将这三个规则合并为一个正则表达式。 所以,我写了一些Python代码来做到这一点:

  1. 匹配模式1,休息,获得标题
  2. 不匹配模式1,尝试匹配模式2
  3. 循环步骤1,2
  4. 我正在尝试将这三个规则合并为一个。

1 个答案:

答案 0 :(得分:2)

这样的东西
(?:^|[])] +)(?P<title>\w+)(?: +[[【(]|$)

Regex demo

示例

>>> strings = ["(event) (tag) [group (artist)] title (form) [addition1] [addition2]", "(event) [group (artist)] title (form) [addition1]", "[event] [group (artist)] title (form) (addition1)", "(tag) [group (artist)] title", "[group (artist)] title", "title", "【tag 】 [group (artist)] title 【form】", "[group (artist)] title", "[group] title", "[artist] title", "(artist) title"]

>>> for string in strings:
...     re.findall(r'(?:^|[])] +)(?P<title>\w+)(?: +[[【(]|$)', string )                                                             ...
['title']
['title']
['title']
['title']
['title']
['title']
['title']
['title']
['title']
['title']
['title']