应用错误收集

不要在正则表达式中捕获可选的html标记

时间：2019-06-09 15:39:51

标签： regex

我有这样的HTML文本。

<td class="team2"><a class="black" href="/team/test/">Tést team</a></td>
<td class="team3"><a class="black" href="/team/test/">opponent team</a></td>
<td class="team2">test team</td>
<td class="team3">my  team</td>

这是我的正则表达式。

<td class="team\d">(<a class="black" href=".+">)?(.+)(<\/a>)?<\/td>

我想分组（读取）队名。但是，您可以看到最后两行没有<a>标签。我的正则表达式也在前两行中选择</a>的结尾。如何避免这种情况？

1 个答案:

答案 0 :(得分：0)

您的原始表达很棒，只是缺少（?），我们将其添加并稍微简化为：

<td(.+?)>(<a(.+?)>)?(.+?)(<\/a>)?<\/td>

Demo

RegEx电路

jex.im可视化正则表达式：

const regex = /<td(.+?)>(<a(.+?)>)?(.+?)(<\/a>)?<\/td>/gm;
const str = `<td class="team2"><a class="black" href="/team/test/">Tést team</a></td>
<td class="team3"><a class="black" href="/team/test/">opponent team</a></td>
<td class="team2">test team</td>
<td class="team3">my  team</td>`;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}