Question

我正在尝试在PHP上编写一个正则表达式但是已经遇到了下面的重复部分。是否可以使用一个正则表达式获取此信息？

使用此论坛 - Grandma (2013/Bluray)

<h1>Grandma / Nice story of grandma 2013 / Grandparents / Granma on vacation (2013/Bluray)</h1>
<h1>Grandma / Nice story of grandma 2013 / Grandparents (2013/Bluray)</h1>
<h1>Grandma / Nice story of grandma 2013 (2013/Bluray)</h1>
<h1>Grandma (2013/Bluray)</h1>

使用此论坛 - Game of death 2 (1981/HDRip)

<h1>Game of death 2 / TD 2 / Super death towers II / Towers of Death / Game of Death II / Tower of Death (1981/HDRip)</h1>
<h1>Game of death 2 / TD II / Super death towers II / Towers of Death / Game of Death II / Tower of Death (1981/HDRip)</h1>
<h1>Game of death 2 / Super death towers II / Towers of Death / Game of Death II / Tower of Death (1981/HDRip)</h1>
<h1>Game of death 2 / Towers of Death / Game of Death II / Tower of Death (1981/HDRip)</h1>
<h1>Game of death 2 / Towers of Death / Tower of Death (1981/HDRip)</h1>
<h1>Game of death 2 / Tower of Death (1981/HDRip)</h1>

我现在的正则表达式是/<h1>([^\/]*)(.*)\((.*)\)<\/h1>/i。但它不适用于<h1>Grandma (2013/Bluray)</h1>。

Answer 1

我无法访问PHP正则表达式引擎来尝试此操作，但以下正则表达式works in .NET：

<h1>([^\/(]*)(?:.*)\((.*)\)<\/h1>

捕获所有样本输入所需的数据。在the live demo page上，单击右侧附近的“20个组”以查看捕获的组的内容。

我改变了你所拥有的两件事：

将[^\/]更改为[^\/(]以避免捕获括号中的内容
将(.*)更改为(?:.*)，使其成为非捕获组，因为我们不关心文本的那一部分

此正则表达式在某些情况下会捕获额外的空格，因此您应该在捕获的组上调用trim()以消除额外的空格。

坚持正则表达式中的重复部分

1 个答案: