Question

我有一个包含以下内容的数据集：

(event) (tag) [group (artist)] title (form) [addition1] [addition2]
(event) [group (artist)] title (form) [addition1]
[event] [group (artist)] title (form) (addition1)
(tag) [group (artist)] title
[group (artist)] title
title
【tag】 [group (artist)] title 【form】
[group (artist)] title
[group] title
[artist] title
(artist) title

我想从每一行获得标题。有三种模式可以匹配标题：
1。 ([\)\]】]\s*(?P<title>[^\[\]\【\】\s]*)\s*[\(\[【])

可以匹配某些行，例如*] title (*

2。 ([\)\]】]\s*(?P<title>[^\[\]\【\】\s]*)
匹配*] title

之类的行

3。 (?P<title>[^\[\]\【\】\s]*)
匹配行只是title

我不知道将这三个规则合并为一个正则表达式。所以，我写了一些Python代码来做到这一点：

匹配模式1，休息，获得标题
不匹配模式1，尝试匹配模式2
循环步骤1,2

我正在尝试将这三个规则合并为一个。

Answer 1

像

这样的东西

(?:^|[])] +)(?P<title>\w+)(?: +[[【(]|$)

Regex demo

示例

>>> strings = ["(event) (tag) [group (artist)] title (form) [addition1] [addition2]", "(event) [group (artist)] title (form) [addition1]", "[event] [group (artist)] title (form) (addition1)", "(tag) [group (artist)] title", "[group (artist)] title", "title", "【tag 】 [group (artist)] title 【form】", "[group (artist)] title", "[group] title", "[artist] title", "(artist) title"] >>> for string in strings: ... re.findall(r'(?:^|[])] +)(?P<title>\w+)(?: +[[【(]|$)', string ) ... ['title'] ['title'] ['title'] ['title'] ['title'] ['title'] ['title'] ['title'] ['title'] ['title'] ['title']

正则表达式：将一些模式合并为一个

1 个答案: