Question

大家晚上好

我研究了cheerio，并尝试从该站点解析数据。它的结构在下面，我将直奔身体：

<body>
<form>
<div class="a">
<h3>Text A</h3>
<h4> Sub-Text A</h4>
<div class="Sub-Class A"> some text </div>
<h4> Sub-Text B</h4>
<div class="Sub-Class B"> some text </div>
<h4> Sub-Text C</h4>
<div class="Sub-Class C"> some text </div>

<h3>Text B</h3>
...
...

<h3>Text C</h3>
</div>
</form>
</body>

任务是将数据从h3解析到数组中，直到下一个h3（即h3，所有h4和div跟随其后，再到下一个h3）。我开始编写函数，但是遇到了上述问题。如何让函数理解我需要将h3之后的所有内容都写到数组的一个元素中，而下一个h3之前呢？

我目前拥有的代码：

const Nightmare = require('nightmare');
const cheerio = require('cheerio');
const nightmare = Nightmare({show: true})
nightmare  
    .goto(url)
    .wait('body')
    .evaluate(()=> document.querySelector('body').innerHTML)
    .end()
    .then(response =>{
        console.log(getData(response));
    }).catch(err=>{
        console.log(err);
    });

let getData = html => {
    data = [];
    const $ = cheerio.load(html);
    $('form div.a').each((i, elem)=>{
        data.push({

        });
    });
    return data;
}

Answer 1

您可以直接跟随“ next（）”元素，直到找到h3：

let texts = $('h3').map((i, el) => {
  let text = ""
  el = $(el)
  while(el = el.next()){
    if(el.length === 0 || el.prop('tagName') === 'H3') break
    text += el.text() + "\n"
  }
  return text
}).get()

使用cheerio在两个标签之间进行网页抓取

1 个答案: