大家晚上好
我研究了cheerio,并尝试从该站点解析数据。它的结构在下面,我将直奔身体:
<body>
<form>
<div class="a">
<h3>Text A</h3>
<h4> Sub-Text A</h4>
<div class="Sub-Class A"> some text </div>
<h4> Sub-Text B</h4>
<div class="Sub-Class B"> some text </div>
<h4> Sub-Text C</h4>
<div class="Sub-Class C"> some text </div>
<h3>Text B</h3>
...
...
<h3>Text C</h3>
</div>
</form>
</body>
任务是将数据从h3解析到数组中,直到下一个h3(即h3,所有h4和div跟随其后,再到下一个h3)。我开始编写函数,但是遇到了上述问题。如何让函数理解我需要将h3之后的所有内容都写到数组的一个元素中,而下一个h3之前呢?
我目前拥有的代码:
const Nightmare = require('nightmare');
const cheerio = require('cheerio');
const nightmare = Nightmare({show: true})
nightmare
.goto(url)
.wait('body')
.evaluate(()=> document.querySelector('body').innerHTML)
.end()
.then(response =>{
console.log(getData(response));
}).catch(err=>{
console.log(err);
});
let getData = html => {
data = [];
const $ = cheerio.load(html);
$('form div.a').each((i, elem)=>{
data.push({
});
});
return data;
}
答案 0 :(得分:0)
您可以直接跟随“ next()”元素,直到找到h3:
let texts = $('h3').map((i, el) => {
let text = ""
el = $(el)
while(el = el.next()){
if(el.length === 0 || el.prop('tagName') === 'H3') break
text += el.text() + "\n"
}
return text
}).get()