如何在废料时跳过表格中的第一个标题行

时间:2017-09-03 19:41:28

标签: python web-scraping psycopg2

我想从我的删除数据中跳过第一个标题行,我正在努力为此编写代码,我们将不胜感激。

到目前为止我提出的代码:

<head>
  <link type="text/css" rel="stylesheet" href="stylesheet.css"/>
  <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css" integrity="sha384-BVYiiSIFeK1dGmJRAkycuHAHRg32OmUcww7on3RYdg4Va+PmSTsz/K68vbdEjh4u" crossorigin="anonymous">
</head>
<body>
  <div class="container">
    <div class="row">
      <div class="col-md-10 col-md-offset-1">
        <div class="tile-container">
          <div class="tile">A</div><!--
       --><div class="tile">B</div><!--
       --><div class="tile">C</div><!--
       --><div class="tile">D</div><!--
       --><div class="tile">E</div>
        </div>
      </div>
    </div>
  </div>
</body>

2 个答案:

答案 0 :(得分:1)

运行它。你不会再有那个空括号了。

import urllib.request ; from bs4 import BeautifulSoup

soup = BeautifulSoup(urllib.request.urlopen("http://tis.nhai.gov.in/TollInformation?TollPlazaID=236").read(),'lxml')
table = soup.find('table' ,{"class":"tollinfotbl"})
rows = [[ele.text.strip() for ele in item.find_all("td")]
        for item in table.find_all("tr")]
for data in rows:
    print(' '.join(data))

如果您愿意,可以使用申请模块:

import requests
from bs4 import BeautifulSoup
soup = BeautifulSoup(requests.get("http://tis.nhai.gov.in/TollInformation?TollPlazaID=236").text,'lxml')
titles = soup.select("table.tollinfotbl")[0]
list_row =[[tab_d.text.strip() for tab_d in item.select('td')]
            for item in titles.select('tr')]

for data in list_row:
    print(' '.join(data))

结果如下:

45.00 70.00 1565.00 25.00

75.00 115.00 2525.00 40.00

160.00 240.00 5290.00 80.00

175.00 260.00 5770.00 85.00

250.00 375.00 8295.00 125.00

250.00 375.00 8295.00 125.00

305.00 455.00 10100.00 150.00

答案 1 :(得分:0)

这真的很糟糕而且过度,但现在是:

row_num = 0
for row in tbody:
    if row_num > 0:
        cols = row.findChildren(recursive=False)
        cols = [ele.text.strip() for ele in cols]
    row_num = row_num + 1