机械化:跟随div内的链接

时间:2012-07-14 04:36:36

标签: python html mechanize

最强大的方法是使Mechanize跟随位于某个div内的链接(br.follow_link)吗?我知道如何通过BeautifulSoup的一些帮助来做到这一点,但有没有办法用Mechanize做到这一点?

示例div:

<div id="blah_links">
 <a href="LINK1" class="active">1</a> |
 <a href="LINK2">2</a> |
 <a href="LINK3">3</a> |
 <a href="LINK4">NEXT</a>
</div>

1 个答案:

答案 0 :(得分:1)

我最近遇到了类似的问题,这就是我做的事情

url = "www.somewhere.com"
br = mechanize.Browser()
br.open(url)

encoded_data = UnicodeDammit(br.response().read(),isHTML=True).unicode
parser = lxml_html.fromstring(encoded_data)

soup_xpath = "//div[@id='BODYCON']//a/@href"
valid_links = soup.xpath(soup_xpath)
links  = [ link for link if link.url in valid_links ]