Question

我正在使用lxml和python。我想为此page上的更多评论（40）链接获取 href 。我基本上是在废弃这个网站，想要得到评论。

非常感谢帮助。 Thanx

Answer 1

使用客户端javascript添加链接。所以你不能使用普通的HTML解析来获得href。但是，您可以查看javascript代码并从那里获取链接：

>>> import re
>>> import urllib2
>>> import lxml.html
>>> page = urllib2.urlopen("http://maps.google.com/maps/place?cid=2860002122405830765").read()

# have to search the page source since the link is added in javascript
>>> mo = re.search(r'<div class="pp-more-reviews">.*?</div>', page)
>>> div = lxml.html.fromstring(mo.group(0))
>>> href = div.find("a").attrib["href"]

其他选项包括：

使用selenium来控制真实的浏览器。
使用phantomJS无头浏览器

Answer 2

我试着用以下方式做到这一点。不是很优雅，但仍然解决了目的

response = urllib.urlopen('http://maps.google.com/maps/place?cid=7101561317478851901').read()
dom = html.fromstring(response)
href = dom.find_class('pp-more-reviews')[0].find_class('pp-more-content-link')[0].xpath('@href')

获取链接的href

2 个答案: