Question

我需要从指定网页获取所有<a>标记网址。此外，我还需要避免页眉和页脚标记之间的<a>标记。我正在加载body标签html但没有标头标签。这是我的代码，但它不起作用。

var $ = cheerio.load(html);
$ = cheerio.load($('body').not('header'));

var links = $("a']");
links.each(function() {
    console.log($(this).attr('href'));
});

如果上面的代码有误，请建议怎么做？

Answer 1

Cheerio就像jQuery一样。

var $ = cheerio.load(html);
var links = $('body').not('header').find('a');
// $('body:not(header) a') may also work.

links.each(function() {
    console.log(this.href);
});

Answer 2

我认为错误是因为您没有在第二次加载时加载HTML。你正在加载身体对象。你应该能够这样做：

var $ = cheerio.load(html);
$ = cheerio.load($('body').html());

$('header').remove();

console.log($.html());

Answer 3

我现在确实喜欢这个工作正常......任何人都可以告诉我这是正确的做法吗？...

var $ = cheerio.load(body);
var t = $('body');
t.children('header').remove();
t.children('footer').remove();
var t = $.html(t);
var $ = cheerio.load(t);
var links = $("a");
links.each(function() {
    console.log($(this).attr('href'));
});

谢谢，

在nodejs中使用cheerio加载特定的HTML？

3 个答案: