无法格式化HTML输出

时间:2017-11-19 17:43:51

标签: python html python-2.7 web-scraping

我有返回HTML页面的python代码。在该页面内,有一条线路#20; 2092 Pittman Road"这是包裹地址。我的代码如下:

import mechanize
br = mechanize.Browser()
response = br.open("https://www.matsugov.us/myproperty")
for form in br.forms():
    if form.attrs.get('name') == 'frmSearch':
    br.form = form
    break
br.form['ddlType']=["taxid"]
br['txtParm']="218N02W27C003"
req=br.submit().read()
print req

req为我提供了HTML格式的o / p。您可以按原样运行此代码以查看o / p。

2 个答案:

答案 0 :(得分:0)

将您的HTML提供给BeautifulSoup,然后根据需要导航或格式化。

答案 1 :(得分:0)

使用此代码,这将适合您:

    from bs4 import BeautifulSoup
    import mechanize
    br = mechanize.Browser()
    response = br.open("https://www.matsugov.us/myproperty")
    for form in br.forms():
         if form.attrs.get('name') == 'frmSearch':
         br.form = form
         break
    br.form['ddlType']=["taxid"]
    br['txtParm']="218N02W27C003"
    req=br.submit().read()
    soup = BeautifulSoup(req, 'html.parser')
    table = soup.find('td', {'class': 'Grid_5'})
    for row in table:
         print row