Question

enter image description here Code with table that I want to scrape

它列出了＆＃34;位置＆＃34;但我想找到＆＃34; 33 Montrose Ave。＆＃34;对于像这样的一组表。我使用BeautifulSoup和Requests来拉取.url并将其解析为HTML。如果我能找到＆＃34;位置＆＃34;文本和使用像nextSibling等的东西，这将是伟大的。谢谢！

import requests
from bs4 import BeautifulSoup


website = 
requests.get("http://wakefield.patriotproperties.com/Summary.asp?
AccountNumber=6867")

content = website.content

soup = BeautifulSoup(content, "html.parser")

table = soup.find('table', {'class': ''})

data = soup.select("table")[0]
tab_data = [[item.text for item in row_data.select("th,td")]
            for row_data in data.select("tr")]

Answer 1

您感兴趣的文字位于第二个表格，tr，td，第二个b标记文字。您可以轻松地将以下代码更改为您想要的内容。

html_table = page_soup.findAll("table")[1]  # second table.
html_trs = html_table.findAll("tr")
for tr in html_trs:
    html_tds = tr.findAll("td")
    for td in html_tds:
        html_bs = td.findAll("b")
        loctext = html_bs[1].text    # second b
        loctext = loctext.lstrip()
        print("loctext=", loctext)

如何使用没有ID标签或类的Python从网站上刮取表格？

1 个答案: