不要在正则表达式中捕获可选的html标记

时间:2019-06-09 15:39:51

标签: regex

我有这样的HTML文本。

<td class="team2"><a class="black" href="/team/test/">Tést team</a></td>
<td class="team3"><a class="black" href="/team/test/">opponent team</a></td>
<td class="team2">test team</td>
<td class="team3">my  team</td>

这是我的正则表达式。

<td class="team\d">(<a class="black" href=".+">)?(.+)(<\/a>)?<\/td>

我想分组(读取)队名。但是,您可以看到最后两行没有<a>标签。我的正则表达式也在前两行中选择</a>的结尾。如何避免这种情况?

enter image description here

1 个答案:

答案 0 :(得分:0)

您的原始表达很棒,只是缺少(?),我们将其添加并稍微简化为:

<td(.+?)>(<a(.+?)>)?(.+?)(<\/a>)?<\/td>

Demo

RegEx电路

jex.im可视化正则表达式:

enter image description here

const regex = /<td(.+?)>(<a(.+?)>)?(.+?)(<\/a>)?<\/td>/gm;
const str = `<td class="team2"><a class="black" href="/team/test/">Tést team</a></td>
<td class="team3"><a class="black" href="/team/test/">opponent team</a></td>
<td class="team2">test team</td>
<td class="team3">my  team</td>`;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}