我正在将wysiwyg
生成的内容解析为React中的目录小部件。
到目前为止,我正在遍历标题并将它们添加到数组中。
如何将它们全部集成到一个多维数组或对象中(最好的方法),使它看起来更像:
h1-1
h2-1
h3-1
h1-2
h2-2
h3-2
h1-3
h2-3
h3-3
然后我可以在UI中使用有序列表进行渲染。
const str = "<h1>h1-1</h1><h2>h2-1</h2><h3>h3-1</h3><p>something</p><h1>h1-2</h1><h2>h2-2</h2><h3>h3-2</h3>";
const patternh1 = /<h1>(.*?)<\/h1>/g;
const patternh2 = /<h2>(.*?)<\/h2>/g;
const patternh3 = /<h3>(.*?)<\/h3>/g;
let h1s = [];
let h2s = [];
let h3s = [];
let matchh1, matchh2, matchh3;
while (matchh1 = patternh1.exec(str))
h1s.push(matchh1[1])
while (matchh2 = patternh2.exec(str))
h2s.push(matchh2[1])
while (matchh3 = patternh3.exec(str))
h3s.push(matchh3[1])
console.log(h1s)
console.log(h2s)
console.log(h3s)
答案 0 :(得分:13)
我不了解您,但我讨厌使用正则表达式解析HTML。相反,我认为让DOM处理这个问题更好:
const str = `<h1>h1-1</h1>
<h3>h3-1</h3>
<h3>h3-2</h3>
<p>something</p>
<h1>h1-2</h1>
<h2>h2-2</h2>
<h3>h3-2</h3>`;
const wrapper = document.createElement('div');
wrapper.innerHTML = str.trim();
let tree = [];
let leaf = null;
for (const node of wrapper.querySelectorAll("h1, h2, h3, h4, h5, h6")) {
const nodeLevel = parseInt(node.tagName[1]);
const newLeaf = {
level: nodeLevel,
text: node.textContent,
children: [],
parent: leaf
};
while (leaf && newLeaf.level <= leaf.level)
leaf = leaf.parent;
if (!leaf)
tree.push(newLeaf);
else
leaf.children.push(newLeaf);
leaf = newLeaf;
}
console.log(tree);
此答案不需要h3
关注h2
;如果您愿意,h3
可以关注h1
。如果要将其转换为有序列表,也可以这样做:
const str = `<h1>h1-1</h1>
<h3>h3-1</h3>
<h3>h3-2</h3>
<p>something</p>
<h1>h1-2</h1>
<h2>h2-2</h2>
<h3>h3-2</h3>`;
const wrapper = document.createElement('div');
wrapper.innerHTML = str.trim();
let tree = [];
let leaf = null;
for (const node of wrapper.querySelectorAll("h1, h2, h3, h4, h5, h6")) {
const nodeLevel = parseInt(node.tagName[1]);
const newLeaf = {
level: nodeLevel,
text: node.textContent,
children: [],
parent: leaf
};
while (leaf && newLeaf.level <= leaf.level)
leaf = leaf.parent;
if (!leaf)
tree.push(newLeaf);
else
leaf.children.push(newLeaf);
leaf = newLeaf;
}
const ol = document.createElement("ol");
(function makeOl(ol, leaves) {
for (const leaf of leaves) {
const li = document.createElement("li");
li.appendChild(new Text(leaf.text));
if (leaf.children.length > 0) {
const subOl = document.createElement("ol");
makeOl(subOl, leaf.children);
li.appendChild(subOl);
}
ol.appendChild(li);
}
})(ol, tree);
// add it to the DOM
document.body.appendChild(ol);
// or get it as text
const result = ol.outerHTML;
由于HTML是由DOM而不是正则表达式解析的,因此,如果h1
标记具有属性,则此解决方案不会遇到任何错误。
答案 1 :(得分:8)
您可以简单地收集所有f2 <- function(str1){
v1 <- strsplit(str1, ",")[[1]]
mean(get(v1[1])[[v1[2]]], na.rm = TRUE)
}
string1 <- "df,col1"
f2(string1)
#[1] 3
,然后迭代它们以构建一个树:
使用ES6 (我推断这可以通过h*
和const
的使用来确定
let
但是因为你的html标题本身不在树形结构中(我猜这是你的用例),这只能在某些假设下工作,例如:除非const str = `
<h1>h1-1</h1>
<h2>h2-1</h2>
<h3>h3-1</h3>
<p>something</p>
<h1>h1-2</h1>
<h2>h2-2</h2>
<h3>h3-2</h3>
`
const patternh = /<h(\d)>(.*?)<\/h(\d)>/g;
let hs = [];
let matchh;
while (matchh = patternh.exec(str))
hs.push({ lev: matchh[1], text: matchh[2] })
console.log(hs)
// constructs a tree with the format [{ value: ..., children: [{ value: ..., children: [...] }, ...] }, ...]
const add = (res, lev, what) => {
if (lev === 0) {
res.push({ value: what, children: [] });
} else {
add(res[res.length - 1].children, lev - 1, what);
}
}
// reduces all hs found into a tree using above method starting with an empty list
const tree = hs.reduce((res, { lev, text }) => {
add(res, lev-1, text);
return res;
}, []);
console.log(tree);
高于<h3>
且高于<h2>
,否则您无法拥有<h1>
。它还假设一个较低级别的标题将始终属于一个更高级别的最新标题。
如果您想进一步使用树结构,例如为TOC呈现代表性有序列表,您可以执行以下操作:
// function to render a bunch of <li>s
const renderLIs = children => children.map(child => `<li>${renderOL(child)}</li>`).join('');
// function to render an <ol> from a tree node
const renderOL = tree => tree.children.length > 0 ? `<ol>${tree.value}${renderLIs(tree.children)}</ol>` : tree.value;
// use a root node for the TOC
const toc = renderOL({ value: 'TOC', children: tree });
console.log(toc);
希望它有所帮助。
答案 2 :(得分:5)
您想要做的事情被称为(a)的变体文档大纲,例如。从文档标题创建嵌套列表,尊重其层次结构。
使用DOM和DOMParser API的浏览器的简单实现如下(放入HTML页面并在ES5中编码以便于测试):
<!DOCTYPE html>
<html>
<head>
<title>Document outline</title>
</head>
<body>
<div id="outline"></div>
<script>
// test string wrapped in a document (and body) element
var str = "<html><body><h1>h1-1</h1><h2>h2-1</h2><h3>h3-1</h3><p>something</p><h1>h1-2</h1><h2>h2-2</h2><h3>h3-2</h3></body></html>";
// util for traversing a DOM and emit SAX startElement events
function emitSAXLikeEvents(node, handler) {
handler.startElement(node)
for (var i = 0; i < node.children.length; i++)
emitSAXLikeEvents(node.children.item(i), handler)
handler.endElement(node)
}
var outline = document.getElementById('outline')
var rank = 0
var context = outline
emitSAXLikeEvents(
(new DOMParser()).parseFromString(str, "text/html").body,
{
startElement: function(node) {
if (/h[1-6]/.test(node.localName)) {
var newRank = +node.localName.substr(1, 1)
// set context li node to append
while (newRank <= rank--)
context = context.parentNode.parentNode
rank = newRank
// create (if 1st li) or
// get (if 2nd or subsequent li) ol element
var ol
if (context.children.length > 0)
ol = context.children[0]
else {
ol = document.createElement('ol')
context.appendChild(ol)
}
// create and append li with text from
// heading element
var li = document.createElement('li')
li.appendChild(
document.createTextNode(node.innerText))
ol.appendChild(li)
context = li
}
},
endElement: function(node) {}
})
</script>
</body>
</html>
我首先将您的片段解析为Document
,然后遍历它以创建类似SAX的startElement()
调用。在startElement()
函数中,根据最近创建的列表项(如果有)的等级检查标题元素的等级。然后在正确的层次结构级别附加新的列表项,并且可能创建ol
元素作为其容器。注意算法,因为它赢得了&#34;跳跃&#34;从层次结构中的h1
到h3
,但可以很容易地进行调整。
如果你想在node.js上创建一个大纲/内容表,可以使代码在服务器端运行,但需要一个像样的HTML解析库(对于node.js的DOMParser polyfill,可以这么说) 。还有用于创建轮廓的https://github.com/h5o/h5o-js和https://github.com/hoyois/html5outliner包,但我还没有对其进行测试。据推测,这些软件包还可以处理角落案例,例如iframe
和quote
元素中的标题元素,这些元素通常不会出现在文档大纲中。
创建HTML5大纲的主题历史悠久;见例如。 http://html5doctor.com/computer-says-no-to-html5-document-outline/。 HTML4的实践是不使用分区根(在HTML5用语中)包装元素,用于在同一层次结构级别进行切片和放置标题和内容,这被称为&#34;平面标记&#34 ;。 SGML具有RANK
功能,可用于处理H1
,H2
等排名元素,并可用于推断省略的section
元素,从而自动创建大纲,类似HTML4&#34;扁平地球标记&#34;在简单的情况下(例如,只允许section
或其他单个元素作为切片根)。
答案 3 :(得分:2)
我将使用单个正则表达式获取<hx></hx>
内容,然后使用方法x
按Array.reduce
对其进行排序。
以下是基础 ,但尚未结束 :
// The string you need to parse
const str = "\
<h1>h1-1</h1>\
<h2>h2-1</h2>\
<h3>h3-1</h3>\
<p>something</p>\
<h1>h1-2</h1>\
<h2>h2-2</h2>\
<h3>h3-2</h3>";
// The regex that will cut down the <hx>something</hx>
const regex = /<h[0-9]{1}>(.*?)<\/h[0-9]{1}>/g;
// We get the matches now
const matches = str.match(regex);
// We match the hx togethers as requested
const matchesSorted = Object.values(matches.reduce((tmp, x) => {
// We get the number behind hx ---> the x
const hNumber = x[2];
// If the container do not exist, create it
if (!tmp[hNumber]) {
tmp[hNumber] = [];
}
// Push the new parsed content into the array
// 4 is to start after <hx>
// length - 9 is to get all except <hx></hx>
tmp[hNumber].push(x.substr(4, x.length - 9));
return tmp;
}, {}));
console.log(matchesSorted);
在解析html内容时,我想了解一些特殊情况,例如\n
或space
的存在。例如,请查看以下非工作代码段:
// The string you need to parse
const str = "\
<h1>h1-1\n\
</h1>\
<h2> h2-1</h2>\
<h3>h3-1</h3>\
<p>something</p>\
<h1>h1-2 </h1>\
<h2>h2-2 \n\
</h2>\
<h3>h3-2</h3>";
// The regex that will cut down the <hx>something</hx>
const regex = /<h[0-9]{1}>(.*?)<\/h[0-9]{1}>/g;
// We get the matches now
const matches = str.match(regex);
// We match the hx togethers as requested
const matchesSorted = Object.values(matches.reduce((tmp, x) => {
// We get the number behind hx ---> the x
const hNumber = x[2];
// If the container do not exist, create it
if (!tmp[hNumber]) {
tmp[hNumber] = [];
}
// Push the new parsed content into the array
// 4 is to start after <hx>
// length - 9 is to get all except <hx></hx>
tmp[hNumber].push(x.substr(4, x.length - 9));
return tmp;
}, {}));
console.log(matchesSorted);
我们必须添加.replace()
和.trim()
才能删除不需要的\n
和spaces
。
使用此代码段
// The string you need to parse
const str = "\
<h1>h1-1\n\
</h1>\
<h2> h2-1</h2>\
<h3>h3-1</h3>\
<p>something</p>\
<h1>h1-2 </h1>\
<h2>h2-2 \n\
</h2>\
<h3>h3-2</h3>";
// Remove all unwanted \n
const preparedStr = str.replace(/(\r\n\t|\n|\r\t)/gm, "");
// The regex that will cut down the <hx>something</hx>
const regex = /<h[0-9]{1}>(.*?)<\/h[0-9]{1}>/g;
// We get the matches now
const matches = preparedStr.match(regex);
// We match the hx togethers as requested
const matchesSorted = Object.values(matches.reduce((tmp, x) => {
// We get the number behind hx ---> the x
const hNumber = x[2];
// If the container do not exist, create it
if (!tmp[hNumber]) {
tmp[hNumber] = [];
}
// Push the new parsed content into the array
// 4 is to start after <hx>
// length - 9 is to get all except <hx></hx>
// call trim() to remove unwanted spaces
tmp[hNumber].push(x.substr(4, x.length - 9).trim());
return tmp;
}, {}));
console.log(matchesSorted);
答案 4 :(得分:2)
我写这个代码适用于JQuery。 (请不要 DV 。以后可能有人需要jquery答案)
这个递归函数创建了li
个字符串,如果一个项目有一些childern,它会将它们转换为ol
。
const str =
"<div><h1>h1-1</h1><h2>h2-1</h2><h3>h3-1</h3></div><p>something</p><h1>h1-2</h1><h2>h2-2</h2><h3>h3-2</h3>";
function strToList(stri) {
const tags = $(stri);
function partToList(el) {
let output = "<li>";
if ($(el).children().length) {
output += "<ol>";
$(el)
.children()
.each(function() {
output += partToList($(this));
});
output += "</ol>";
} else {
output += $(el).text();
}
return output + "</li>";
}
let output = "<ol>";
tags.each(function(itm) {
output += partToList($(this));
});
return output + "</ol>";
}
$("#output").append(strToList(str));
li {
padding: 10px;
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div id="output"></div>
(此代码可以轻松转换为纯JS)