我有这个字符串:
<p><ins>Article </ins>Title</p>
<p>Here's some sample text</p>
我希望得到忽略html标签到数组的单词,即
['Article','Title','Here's','some','sample','text']
我试图创建一个正则表达式,但它不会成功。 提前谢谢。
答案 0 :(得分:5)
将它们放入虚拟div
并获取innerText
var str = `<p><ins>Article </ins>Title</p>
<p>Here's some sample text</p>`;
var div = document.createElement( "div" );
div.innerHTML = str; //assign str as innerHTML
var text = div.innerText; //get text only
var output = text.split( /\s+/ ); //split by one or more spaces including line feeds
console.log( output );
答案 1 :(得分:3)
您不需要正则表达式,只需使用浏览器的API:
const html = "<p><ins>Article </ins>Title</p> <p>Here's some sample text</p>";
const div = document.createElement("div");
div.innerHTML = html;
// This will extract the text (remove the HTML tags)
const text = div.textContent || div.innerText || "";
console.log(text);
// Then you can simply split the string
const result = text.split(' ');
console.log(result);
&#13;