用于提取HTML标记子元素的正则表达式?

时间:2016-04-25 17:30:56

标签: javascript regex matcher

我在HTML字符串中有以下代码。

<h3 class="large lheight20 margintop10">
<a href="https://google.com" class="marginright5 link linkWithHash detailsLink">
<span>get the content</span>
</a>

</h3><h3 class="large lheight20 margintop10">
<a href="https://google.com" class="marginright5 link linkWithHash detailsLink">
<span>get the content</span>
</a>

</h3>

我希望提取以下标记:

    <a href="https://google.com" class="marginright5 link linkWithHash detailsLink">
    <span>get the content</span>
    </a>
<a href="https://google.com" class="marginright5 link linkWithHash detailsLink">
<span>get the content</span>
</a>

我写了以下正则表达式:

<h3[^>]+?>(.*)<\/h3>

但它返回了错误的结果:

<a href="https://google.com" class="marginright5 link linkWithHash detailsLink">
<span>get the content</span>
</a>

</h3><h3 class="large lheight20 margintop10">
<a href="https://google.com" class="marginright5 link linkWithHash detailsLink">
<span>get the content</span>
</a>

请帮我提取标签。

2 个答案:

答案 0 :(得分:2)

使用此正则表达式:

<h3[^>]+?>([^$]+?)<\/h3>

此处示例:

https://regex101.com/r/pQ5nE0/2

答案 1 :(得分:2)

你可以尝试:

function getA(str) {
  var regex = /<a\s+[\s\S]+?<\/a>/g;
  while (found = regex.exec(str)) {
    document.write(found[0] + '<br>');
  }
}

var str = '<h3 class="large lheight20 margintop10">\n' +
  '<a href="https://google.com" class="marginright5 link linkWithHash detailsLink">\n' +
  '<span>get the content</span>\n' +
  '</a>\n' +
  '\n' +
  '</h3><h3 class="large lheight20 margintop10">\n' +
  '<a href="https://google.com" class="marginright5 link linkWithHash detailsLink">\n' +
  '<span>get the content</span>\n' +
  '</a>\n' +
  '\n' +
  '</h3>';
getA(str);