我正在试图找出我需要添加到此代码中的内容,因此在读取url源代码后,我可以删除除标记之间的文本之外的所有内容,然后将其打印结果
import urllib.request
req = urllib.request.Request('http://myurlhere.com')
response = urllib.request.urlopen(req)
the_page = response.read()
print (the_page)
答案 0 :(得分:0)
您需要HTML解析器。
使用BeautifulSoup
的示例(它支持Python-3.x):
import urllib.request
from bs4 import BeautifulSoup
req = urllib.request.Request('http://onlinepermits.co.escambia.fl.us/CitizenAccess/Cap/CapDetail.aspx?Module=Building&capID1=14ACC&capID2=00000&capID3=00386&agencyCode=ESCAMBIA')
response = urllib.request.urlopen(req)
soup = BeautifulSoup(response)
print(soup.find('td', id='ctl00_PlaceHolderMain_PermitDetailList1_owner').div.table.text)
打印:
SNB HOTEL INC2607 WILDE LAKE BLVD PENSACOLA FL 32526