我有这样的HTML文本。
<td class="team2"><a class="black" href="/team/test/">Tést team</a></td>
<td class="team3"><a class="black" href="/team/test/">opponent team</a></td>
<td class="team2">test team</td>
<td class="team3">my team</td>
这是我的正则表达式。
<td class="team\d">(<a class="black" href=".+">)?(.+)(<\/a>)?<\/td>
我想分组(读取)队名。但是,您可以看到最后两行没有<a>
标签。我的正则表达式也在前两行中选择</a>
的结尾。如何避免这种情况?
答案 0 :(得分:0)
您的原始表达很棒,只是缺少(?
),我们将其添加并稍微简化为:
<td(.+?)>(<a(.+?)>)?(.+?)(<\/a>)?<\/td>
jex.im可视化正则表达式:
const regex = /<td(.+?)>(<a(.+?)>)?(.+?)(<\/a>)?<\/td>/gm;
const str = `<td class="team2"><a class="black" href="/team/test/">Tést team</a></td>
<td class="team3"><a class="black" href="/team/test/">opponent team</a></td>
<td class="team2">test team</td>
<td class="team3">my team</td>`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}