Question

我在HTML字符串中有以下代码。

<h3 class="large lheight20 margintop10">
<a href="https://google.com" class="marginright5 link linkWithHash detailsLink">
<span>get the content</span>
</a>

</h3><h3 class="large lheight20 margintop10">
<a href="https://google.com" class="marginright5 link linkWithHash detailsLink">
<span>get the content</span>
</a>

</h3>

我希望提取以下标记：

    <a href="https://google.com" class="marginright5 link linkWithHash detailsLink">
    <span>get the content</span>
    </a>
<a href="https://google.com" class="marginright5 link linkWithHash detailsLink">
<span>get the content</span>
</a>

我写了以下正则表达式：

<h3[^>]+?>(.*)<\/h3>

但它返回了错误的结果：

<a href="https://google.com" class="marginright5 link linkWithHash detailsLink">
<span>get the content</span>
</a>

</h3><h3 class="large lheight20 margintop10">
<a href="https://google.com" class="marginright5 link linkWithHash detailsLink">
<span>get the content</span>
</a>

请帮我提取标签。

Answer 1

使用此正则表达式：

<h3[^>]+?>([^$]+?)<\/h3>

此处示例：

https://regex101.com/r/pQ5nE0/2

Answer 2

你可以尝试：

function getA(str) {
  var regex = /<a\s+[\s\S]+?<\/a>/g;
  while (found = regex.exec(str)) {
    document.write(found[0] + '<br>');
  }
}

var str = '<h3 class="large lheight20 margintop10">\n' +
  '<a href="https://google.com" class="marginright5 link linkWithHash detailsLink">\n' +
  '<span>get the content</span>\n' +
  '</a>\n' +
  '\n' +
  '</h3><h3 class="large lheight20 margintop10">\n' +
  '<a href="https://google.com" class="marginright5 link linkWithHash detailsLink">\n' +
  '<span>get the content</span>\n' +
  '</a>\n' +
  '\n' +
  '</h3>';
getA(str);

用于提取HTML标记子元素的正则表达式？

2 个答案: