如何使用没有ID标签或类的Python从网站上刮取表格?

时间:2018-03-27 01:03:33

标签: python web-scraping beautifulsoup python-requests

enter image description here Code with table that I want to scrape

它列出了"位置"但我想找到" 33 Montrose Ave。"对于像这样的一组表。我使用BeautifulSoup和Requests来拉取.url并将其解析为HTML。如果我能找到"位置"文本和使用像nextSibling等的东西,这将是伟大的。谢谢!

import requests
from bs4 import BeautifulSoup


website = 
requests.get("http://wakefield.patriotproperties.com/Summary.asp?
AccountNumber=6867")

content = website.content

soup = BeautifulSoup(content, "html.parser")

table = soup.find('table', {'class': ''})

data = soup.select("table")[0]
tab_data = [[item.text for item in row_data.select("th,td")]
            for row_data in data.select("tr")]

1 个答案:

答案 0 :(得分:0)

您感兴趣的文字位于第二个表格,tr,td,第二个b标记文字。 您可以轻松地将以下代码更改为您想要的内容。

html_table = page_soup.findAll("table")[1]  # second table.
html_trs = html_table.findAll("tr")
for tr in html_trs:
    html_tds = tr.findAll("td")
    for td in html_tds:
        html_bs = td.findAll("b")
        loctext = html_bs[1].text    # second b
        loctext = loctext.lstrip()
        print("loctext=", loctext)