删除没有ID或类的HTML表文本-JavaScript或jQuery

时间:2018-06-22 10:02:18

标签: javascript jquery html web-scraping

我在尝试从网站HTML表中抓取som数据时遇到了一些困难。我要检索的标签没有ID或类,因此如果你们可以帮助我,我会很痛苦:

这是表格的外观(由于本文不占用大量空间,因此代码被剪掉了)

<table class="table table-striped table-large1">
    <thead>
<tr class="small">
    <th>No</th>
    <th>Date/Time</th>
    <th colspan="7">Indexed pages /<br>
    Processed / Skipped / Fetched /<br>
    Change (Added / Removed)</th>
    <th>Proc.time</th>
    <th>Bandwidth</th>
    <th>Broken links</th>
    <th>Images</th>
    <th>Videos</th>
    <th>RSS</th>
    <th>News</th>
</tr>
</thead>
<tbody><tr class="block1">
    <td>1</td>
    <td><a href="site/3845806/chlog/?log=8950501" title="View details">2018-06-20 01:13</a></td>
    <td>944</td>
    <td>969</td>

    <td><i><strike>25</strike></i></td>
    <td>920</td>

    <td><i style="color:#900">↓-2</i></td>
    <td><i>-</i></td>
    <td><i>-2</i></td>

    <td>0:12:44s</td>
    <td>28.82M</td>
    <td>3</td>
<td>580</td>
<td>4</td>
<td>8</td>
<td>0</td>
</tr>
<tr class="block1">
    <td>2</td>
    <td><a href="site/3845806/chlog/?log=8934464" title="View details">2018-06-17 01:14</a></td>
    <td>946</td>
    <td>968</td>

    <td><i><strike>22</strike></i></td>
    <td>919</td>

    <td></td>
    <td><i>+2</i></td>
    <td><i>-2</i></td>

    <td>0:14:05s</td>
    <td>28.89M</td>
    <td>0</td>
<td>580</td>
<td>4</td>
<td>8</td>
<td>0</td>
    </tr>
(........)

我要抓的是这两行:

<td><a href="site/3845806/chlog/?log=8950501" title="View details">2018-06-20 01:13</a></td>
<td>944</td>

这些在每个索引2中,我如何获得所有这些值?

1 个答案:

答案 0 :(得分:1)

遍历所有tr标签并使用jquery的find()方法定位特定的td元素。然后使用innerHTML = "";

清除其html
$(".table-large1 tr").each(function() {
  if ($(this).find("td").length > 0) {
    $(this).find("td")[1].innerHTML = "";
    $(this).find("td")[2].innerHTML = "";
  }

})

$(".table-large1 tr").each(function() {
  if ($(this).find("td").length > 0) {
    $(this).find("td")[1].innerHTML = "";
    $(this).find("td")[2].innerHTML = "";
  }

})
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<table class="table table-striped table-large1">
  <thead>
    <tr class="small">
      <th>No</th>
      <th>Date/Time</th>
      <th colspan="7">Indexed pages /<br> Processed / Skipped / Fetched /<br> Change (Added / Removed)</th>
      <th>Proc.time</th>
      <th>Bandwidth</th>
      <th>Broken links</th>
      <th>Images</th>
      <th>Videos</th>
      <th>RSS</th>
      <th>News</th>
    </tr>
  </thead>
  <tbody>
    <tr class="block1">
      <td>1</td>
      <td><a href="site/3845806/chlog/?log=8950501" title="View details">2018-06-20 01:13</a></td>
      <td>944</td>
      <td>969</td>

      <td><i><strike>25</strike></i></td>
      <td>920</td>

      <td><i style="color:#900">↓-2</i></td>
      <td><i>-</i></td>
      <td><i>-2</i></td>

      <td>0:12:44s</td>
      <td>28.82M</td>
      <td>3</td>
      <td>580</td>
      <td>4</td>
      <td>8</td>
      <td>0</td>
    </tr>
    <tr class="block1">
      <td>2</td>
      <td><a href="site/3845806/chlog/?log=8934464" title="View details">2018-06-17 01:14</a></td>
      <td>946</td>
      <td>968</td>

      <td><i><strike>22</strike></i></td>
      <td>919</td>

      <td></td>
      <td><i>+2</i></td>
      <td><i>-2</i></td>

      <td>0:14:05s</td>
      <td>28.89M</td>
      <td>0</td>
      <td>580</td>
      <td>4</td>
      <td>8</td>
      <td>0</td>
    </tr>
</table>