我正在尝试使用BeautifulSoup在网站上进行一些网络浏览。但是当我试图获取div类AddressInfo中的内容时,我收到错误,这是我要抓取的网站的一部分:
<h4>Altônia</h4>
<div class="addressInfo">
Rua Getulio Vargas, 1201<br>
Centro - Iporã - PR<br>
87550-000<br>
<br>
(44) 3659-2721<br>
<a href="mailto:altoniacentro.pr@escolas.com.br">altoniacentro.pr@escolas.com.br</a><br>
</div>
这是我的代码:
from urllib import urlopen
from BeautifulSoup import BeautifulSoup
import re
# Copy all of the content from the provided web page
webpage = urlopen('site url....').read()
# Grab everything that lies between the h4 tags using a REGEX
patFinderTitle = re.compile('<h4>(.*)</h4>')
# Grab everything that lies between the class addressInfo tags using a REGEX
patFinderAddress = re.compile('<div class="addressInfo">(.*)</div>') **<- get error here**
这是我得到的错误:
raise ValueError('Cannot process flags argument with a compiled pattern') ValueError: Cannot process flags argument with a compiled
图案
我该如何解决这个问题?
答案 0 :(得分:2)
更好地使用xpath,它更简单: 试试这个:
from lxml import html
import requests
url = 'http://.....'
page = requests.get(url)
tree = html.fromstring(page.text)
a = tree.xpath('//h4/text()')
b = tree.xpath('//div[@class="addressInfo"]/text()')
c = tree.xpath('//a//text()')
print a, b, c