Python Scraper - 在列中查找数据

时间:2018-01-10 21:06:27

标签: html python-3.x web-scraping beautifulsoup findall

我正在制作我的第一个网站刮刀,并且我正在尝试获取保存在网页https://mcassessor.maricopa.gov/mcs.php?q=14014003N的列中的数字41,110。以下是我的代码。

如何获取此号码并打印出来?

from bs4 import BeautifulSoup
import requests
web_page = 'https://mcassessor.maricopa.gov/mcs.php?q=14014003N'
web_header = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}
response = requests.get(web_page,headers=web_header)
soup = BeautifulSoup(response.content,'html.parser')
for row in soup.findAll('table')[0].thread.tr.findAll('tr'):
    first_column = row.findAll('th')[0].contents
    print(first_column)

1 个答案:

答案 0 :(得分:0)

直接的方法将涉及到"改进" table,获取第一个非标题行,然后获取此行中的最后一个单元格:

table = soup.find("table", id="improvements-table")
first_row = table.find_all("tr")[1]  # skipping a header
last_cell = first_row.find_all("td")[-1]
print(last_cell.get_text())  # prints 41,110

更通用的方法是在该表中创建一个字典列表,其中键是标题名称:

table = soup.find("table", id="improvements-table")
headers = [th.get_text() for th in table('th')]

data = [dict(zip(headers, [td.get_text() for td in row('td')])) for row in table("tr")[1:]]
print(data)
print(data[0]['Sq Ft.'])

打印:

[
    {u'Imp #': u'000101', u'Description': u'Mini-Warehouse', u'Age': u'1', u'Rank': u'2', u'Sq Ft.': u'41,110', u'CCI': u'C', u'Model': u'386'}, 
    {u'Imp #': u'000201', u'Description': u'Site Improvements', u'Age': u'1', u'Rank': u'2', u'Sq Ft.': u'1', u'CCI': u'D', u'Model': u'163'}
]
41,110