Question

我正在制作我的第一个网站刮刀，并且我正在尝试获取保存在网页https://mcassessor.maricopa.gov/mcs.php?q=14014003N的列中的数字41,110。以下是我的代码。

如何获取此号码并打印出来？

from bs4 import BeautifulSoup
import requests
web_page = 'https://mcassessor.maricopa.gov/mcs.php?q=14014003N'
web_header = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}
response = requests.get(web_page,headers=web_header)
soup = BeautifulSoup(response.content,'html.parser')
for row in soup.findAll('table')[0].thread.tr.findAll('tr'):
    first_column = row.findAll('th')[0].contents
    print(first_column)

Answer 1

直接的方法将涉及到＆＃34;改进＆＃34; table，获取第一个非标题行，然后获取此行中的最后一个单元格：

table = soup.find("table", id="improvements-table")
first_row = table.find_all("tr")[1]  # skipping a header
last_cell = first_row.find_all("td")[-1]
print(last_cell.get_text())  # prints 41,110

更通用的方法是在该表中创建一个字典列表，其中键是标题名称：

table = soup.find("table", id="improvements-table")
headers = [th.get_text() for th in table('th')]

data = [dict(zip(headers, [td.get_text() for td in row('td')])) for row in table("tr")[1:]]
print(data)
print(data[0]['Sq Ft.'])

打印：

[
    {u'Imp #': u'000101', u'Description': u'Mini-Warehouse', u'Age': u'1', u'Rank': u'2', u'Sq Ft.': u'41,110', u'CCI': u'C', u'Model': u'386'}, 
    {u'Imp #': u'000201', u'Description': u'Site Improvements', u'Age': u'1', u'Rank': u'2', u'Sq Ft.': u'1', u'CCI': u'D', u'Model': u'163'}
]
41,110

Python Scraper - 在列中查找数据

1 个答案: