我正在制作我的第一个网站刮刀,并且我正在尝试获取保存在网页https://mcassessor.maricopa.gov/mcs.php?q=14014003N的列中的数字41,110。以下是我的代码。
如何获取此号码并打印出来?
from bs4 import BeautifulSoup
import requests
web_page = 'https://mcassessor.maricopa.gov/mcs.php?q=14014003N'
web_header = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}
response = requests.get(web_page,headers=web_header)
soup = BeautifulSoup(response.content,'html.parser')
for row in soup.findAll('table')[0].thread.tr.findAll('tr'):
first_column = row.findAll('th')[0].contents
print(first_column)
答案 0 :(得分:0)
直接的方法将涉及到"改进" table,获取第一个非标题行,然后获取此行中的最后一个单元格:
table = soup.find("table", id="improvements-table")
first_row = table.find_all("tr")[1] # skipping a header
last_cell = first_row.find_all("td")[-1]
print(last_cell.get_text()) # prints 41,110
更通用的方法是在该表中创建一个字典列表,其中键是标题名称:
table = soup.find("table", id="improvements-table")
headers = [th.get_text() for th in table('th')]
data = [dict(zip(headers, [td.get_text() for td in row('td')])) for row in table("tr")[1:]]
print(data)
print(data[0]['Sq Ft.'])
打印:
[
{u'Imp #': u'000101', u'Description': u'Mini-Warehouse', u'Age': u'1', u'Rank': u'2', u'Sq Ft.': u'41,110', u'CCI': u'C', u'Model': u'386'},
{u'Imp #': u'000201', u'Description': u'Site Improvements', u'Age': u'1', u'Rank': u'2', u'Sq Ft.': u'1', u'CCI': u'D', u'Model': u'163'}
]
41,110