如何将HTML字符串拆分为单词和标签数组

时间:2018-04-18 19:10:20

标签: javascript regex

如何将HTML字符串拆分为数组,以便每个单词都是数组中的项目,包括它周围的标记?

//So this string:
var myHTMLString = "Something, something <span @click='changeSelected(0)' id='0' class='wrong'>else</span> is foo <span @click='changeSelected(0)' id='0' class='wrong'>hello world</span> to all.";

//Would become this:
var HTMLAry = ["Something,", "something", "<span @click='changeSelected(0)' id='0' class='wrong'>else</span>", "is", "foo", "<span @click='changeSelected(0)' id='0' class='wrong'>hello world</span>", "to", "all."];

我们可以依赖的事情:

  • 标记始终是span标记,其确切属性如上面的示例^
  • 并非每个单词都有span个标记。
  • 有些单词可能有多个空格将它们分开。

我怎样才能做到这一点?

我能想到的唯一可能对此有用的是某种正则表达式,但是其他somewhat similar answers表示在大多数情况下,在处理HTML标记时应远离正则表达式。但是正则表达式是我能想象的唯一可行的东西。

&#13;
&#13;
var myHTMLString = "Something, something <span @click='changeSelected(0)' id='0' class='wrong'>else</span> is foo <span @click='changeSelected(0)' id='0' class='wrong'>hello world</span> to all.";

//This^ would become this:

var HTMLAry = ["Something,", "something", "<span @click='changeSelected(0)' id='0' class='wrong'>else</span>", "is", "foo", "<span @click='changeSelected(0)' id='0' class='wrong'>hello world</span>", "to", "all."];
    
console.log(myHTMLString.match(/<span.*?>.*?<\/span\>/g));
&#13;
&#13;
&#13;

1 个答案:

答案 0 :(得分:1)

创建一个元素,将元素html设置为您的字符串,获取子节点,拆分空格上的文本节点并过滤掉空白,获取其他节点的outerHTML,然后展平数组。

&#13;
&#13;
var myHTMLString = "Something, something <span @click='changeSelected(0)' id='0' class='wrong'>else</span> is foo <span @click='changeSelected(0)' id='0' class='wrong'>hello world</span> to all.";

var el = document.createElement('div');

el.innerHTML = myHTMLString;

var arr = Array.from(el.childNodes).map(e => e.outerHTML || e.nodeValue.split(' ').filter(t => t));

console.log([].concat.apply([], arr))
&#13;
&#13;
&#13;