Question

我尝试在以下HTML中访问的项目是“GMV DLL VERSION2”

    <div class="container content">

  <main>
    <h2 id="rpcs--gmv-dll-version"><a href="/artifacts/vistaRPC%20documentation/TableOfContent">RPCs</a> → GMV DLL VERSION</h2>

<h3 id="vista-file-8994">VISTA File 8994</h3>

<table>
  <thead>
    <tr>
      <th>property</th>
      <th>value</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>label</td>
      <td>GMV DLL VERSION2</td>

我正试图抓住这个网站（http://vistadataproject.info/artifacts/vistaRPC%20documentation/GMV%20DLL%20VERSION）

并将其输出到文本文件中。我成功地使用reddit.com进行了测试。但是，我似乎无法让这个页面获得一个单独的元素。为了测试它，甚至在解决这个问题之前，我一直试图抓住页面上很早（在顶部区域）的一些元素。

表中缺少classNames和Id非常棘手，但是甚至无法获得标题文本真的让我想知道发生了什么。任何输入将不胜感激。请求（http://vistadataproject.info/artifacts/vistaRPC%20documentation/GMV%20DLL%20VERSION，（错误，res，body）=＆gt; {

if (err) {
    console.log('Error: ' + err);
  }
  console.log('Status: ' + res.statusCode);

  const $ = cheerio.load(body);

  $('header.masthead > div.container').each(( index, tr ) => {
    // var children = $(this).children();
    const tableData = $(this).find('a.logo').text();
console.log("Table Contents: " + tableData);

fs.appendFileSync('test.txt', tableData + '\n' + 'Captured');


});

Answer 1

问题是'标头'是一个类名，而不是一个id。同样处理'容器'和'徽标'。因此，您需要相应地调整选择器：

$('header.masthead > div.container').each(( index, tr ) => {

但是，这只会获取标题信息，其中不包含包含'property =＆gt;的表格。价值'数据。对于该信息，您只需要在'＆lt; main＆gt;'下查找子表。标签

Cheerio无法抓取没有ID或className的表，但正在正确访问路径

1 个答案: