需要代码从url获取特定文本

时间:2014-08-31 22:59:16

标签: python python-3.x request urllib

我正在试图找出我需要添加到此代码中的内容,因此在读取url源代码后,我可以删除除标记之间的文本之外的所有内容,然后将其打印结果

import urllib.request

req = urllib.request.Request('http://myurlhere.com')
response = urllib.request.urlopen(req)
the_page = response.read()
print (the_page)

1 个答案:

答案 0 :(得分:0)

您需要HTML解析器。

使用BeautifulSoup的示例(它支持Python-3.x):

import urllib.request
from bs4 import BeautifulSoup

req = urllib.request.Request('http://onlinepermits.co.escambia.fl.us/CitizenAccess/Cap/CapDetail.aspx?Module=Building&capID1=14ACC&capID2=00000&capID3=00386&agencyCode=ESCAMBIA')
response = urllib.request.urlopen(req)
soup = BeautifulSoup(response)
print(soup.find('td', id='ctl00_PlaceHolderMain_PermitDetailList1_owner').div.table.text)

打印:

SNB HOTEL INC2607 WILDE LAKE BLVD PENSACOLA FL 32526