Question

我正在尝试从网站上抓取文字，但似乎无法提取任何内容。

下面是结构和代码。

我的代码：

const rp = require("request-promise");
const $ = require("cheerio");
const url = "xx";

rp(url)
  .then(function(html) {
    //success!
    let token = "ce-bodytext";
    console.log($(token, response).length);
    console.log($(token, html)).text;
  })
  .catch(function(err) {
    console.log(JSON.stringify(err));
  });

虽然我只需要文本，但是标签没有id。另外，我希望ce-bodytext将按顺序

提取所有值，但我得到的只是空输出。

{}

如何仅提取图像中显示的文本？

Answer 1

尝试一下：

let token = ".ce-bodytext>p>strong>font>font";
console.log($(token, html).text());

Answer 2

ce-bodytext是class，您忘记在其前面添加.：

const token = '.ce-bodytext';

它将至少修复空输出。

使用nodejs cheerio深层嵌套元素标签抓取网站

2 个答案: