Question

我正在尝试制作网络抓取工具，并且成功下载了html。现在，通过此代码，我尝试查找html的标题：

git reset --hard HEAD@{1}

当我只是console.log时，我得到的对象的子代：

fs.readFile(__filename.json , function (err, data) {
if(err) throw err;

const $ = cheerio.load(data);
const urlElemt = $('article.product-tile');

if(urlElemt){
    console.log("Found " + urlElemt.length + " elements");

    let urlTitle = $(urlElemt.find("h2.product-tile__title"));
    let urlPrice = $(urlElemt.find("span.__price"));

    for(let i = 0; i < 1; i++) {
        console.log(Title[i].children)
    }
}

其中数据：“ TuborgGrønPilsnerØl4,6％”，是我要检索的数据。

我尝试同时使用两者

[ { type: 'text',
data: 'Tuborg Grøn Pilsner Øl 4,6%',
parent: 
 { type: 'tag',
   name: 'h2',
   namespace: 'http://www.w3.org/1999/xhtml',
   attribs: [Object],
   'x-attribsNamespace': [Object],
   'x-attribsPrefix': [Object],
   children: [Circular],
   parent: [Object],
   prev: [Object],
   next: [Object] },
prev: null,
next: null } ]

和

console.log(Title[i].children["data"])

但是结果总是让我变得“未定义”，我误解了什么和/或做错了什么？

Answer 1

在您的

const urlElemt = $('article.product-tile')
...
let urlTitle = $(urlElemt.find("h2.product-tile__title"))

find()函数已经返回了Cheerio对象，因此您无需将其传递给$函数。这样就足够了：

let urlTitle = urlElemt.find("h2.product-tile__title")

因此您可以做

console.log(urlTitle.text())

或

console.log(urlTitle.html())

查看dom节点的序列化版本，在您的情况下，该版本应为纯文本字符串。（请参阅the api docs）

当我尝试检索JSON数据时，Cheerio返回未定义

1 个答案: