Question

给出以下HTML：

<html>
    <head>
        <title>This is text within the title tag</title>
    </head>
    <body>
        This is text in the body tag
        <br>
        <h1>This is text in the h1 tag</h1>
        <p>This is text in the p tag</p>
        There is more text in the body after the p tag
    </body>
</html>

我希望使用HTML解析器CheerioJS将每个HTML标记收集到一个数组中以便进行操作。

所需的输出将是以下数组：

[html, head, title, /title, /head, body, br, h1, /h1, p, /p, /body, /html]

我一直在关注Cheerio's DOM object，但我不确定这是否是我需要的。

Answer 1

你可以这样做：

$('*').get().map(el => el.name)
// [ 'html', 'head', 'title', 'body', 'br', 'h1', 'p' ]

请注意，结束标记不是离散节点，它们是开始标记所属节点的一部分。

Answer 2

我认为你不需要外部库，你可以使用一个小函数自己走DOM。

const list = [];

function walkTheDOM(node, iteratee) {
    iteratee(node);
    node = node.firstChild;

    while (node) {
        walkTheDOM(node, iteratee);
        node = node.nextSibling;
    }
}

walkTheDOM(document.getElementsByTagName('html')[0], function (node) {
    list.push(node)
});

console.log(list);
// [html, head, text, meta, ...]

这是Fiddle。

从CheerioJS DOM对象中检索所有标记名称

2 个答案: