Question

我有html代码的字符串。

<h2 class="some-class"> 
   <a href="#link" class="link" id="first-link"
      <span class="bold">link</span>
   </a>
   NEED TO GET THIS
</h2>

我只需要获取h2的文本内容。我创建了这个正则表达式：

(?<=>)(.*)(?=<\/h2>)

但如果h2没有内部标签，那么它很有用。否则我明白了：

   <a href="#link" class="link" id="first-link"
      <span class="bold">link</span>
   </a>
   NEED TO GET THIS

Answer 1

永远不要使用正则表达式来解析HTML，请检查这些着名的答案：

Using regular expressions to parse HTML: why not?

RegEx match open tags except XHTML self-contained tags

相反，生成一个文本为HTML的临时元素，并通过过滤掉文本节点来获取内容。

var str = `<h2 class="some-class"> 
   <a href="#link" class="link" id="first-link"
      <span class="bold">link</span>
   </a>
   NEED TO GET THIS
</h2>`;

// generate a temporary DOM element
var temp = document.createElement('div');
// set content
temp.innerHTML = str;
// get the h2 element
var h2 = temp.querySelector('h2');

console.log(
  // get all child nodes and convert into array
  // for older browser use [].slice.call(h2...)
  Array.from(h2.childNodes)
  // iterate over elements
  .map(function(e) {
    // if text node then return the content, else return 
    // empty string
    return e.nodeType === 3 ? e.textContent.trim() : '';
  })
  // join the string array
  .join('')
  // you can use reduce method instead of map
  // .reduce(function(s, e) { return s + (e.nodeType === 3 ? e.textContent.trim() : ''); }, '') 
)

参考：

Fastest way to convert JavaScript NodeList to Array?

Answer 2

demo

var h2 = document.querySelector('h2')

var h2_clone = h2.cloneNode(true)

for (let el of h2_clone.children) {
    el.remove()
}

alert(h2_clone.innerText)

Answer 3

Rgex不适合解析HTML，但如果你的html无效或者你喜欢使用正则表达式，那么

(?!>)([^><]+)(?=<\/h2>)

try Demo

在关闭</h2> （IF EXISTS）
要避免将null结果更改为*至+。
此正则表达式完全限制，适合有限情况，如上所述。

正则表达式。只获取标签的文本内容（没有内部标签）

3 个答案: