正则表达式获取HTML标签之间的单词

时间:2017-12-22 08:22:42

标签: javascript html regex

我有这个字符串:

<p><ins>Article </ins>Title</p> 

<p>Here&#39;s some sample text</p>

我希望得到忽略html标签到数组的单词,即

['Article','Title','Here&#39;s','some','sample','text']

我试图创建一个正则表达式,但它不会成功。 提前谢谢。

2 个答案:

答案 0 :(得分:5)

将它们放入虚拟div并获取innerText

var str = `<p><ins>Article </ins>Title</p> 
<p>Here&#39;s some sample text</p>`;

var div = document.createElement( "div" );
div.innerHTML = str; //assign str as innerHTML
var text = div.innerText; //get text only

var output = text.split( /\s+/ ); //split by one or more spaces including line feeds
console.log( output );

答案 1 :(得分:3)

您不需要正则表达式,只需使用浏览器的API:

&#13;
&#13;
const html = "<p><ins>Article </ins>Title</p> <p>Here&#39;s some sample text</p>";
const div = document.createElement("div");
div.innerHTML = html;

// This will extract the text (remove the HTML tags)
const text = div.textContent || div.innerText || "";
console.log(text);

// Then you can simply split the string
const result = text.split(' ');
console.log(result);
&#13;
&#13;
&#13;