Question

我对Python有点新鲜，对于我的一个研究项目，我需要一个网络刮刀来抓取网页内容来创建数据集。

由于大多数线程建议使用beautifulsoup包，我尝试构建基于Python的Web scraper。

点击网页上的按钮后，我需要加载数据。

这是一个例子：

http://www.engadget.com/products/apple/iphone/6/

点击“12评论”时会显示弹出窗口并显示评论。我需要抓住这些评论。

我尝试了很多方法，但到目前为止似乎没有任何工作。有人可以查看我的代码，如果还有什么需要做或建议我采用另一种方法吗？

import bs4
import requests
session = requests.Session()
url = "http://www.engadget.com/products/apple/iphone/6/" 
page  = session.get(url).text
soup = bs4.BeautifulSoup(page, "html5lib")
engadgetul = soup.find("ul", class_= "product-criteria-bars")
engadgetdiv = engadgetul.find_all("div", class_="product-criteria-label")
for engadgetrv in engadgetdiv:
  review = engadgetrv.find_all("p", "comment-text")
for rr in review:
  print(rr.span.string)

Answer 1

单击这些链接时，将使用Javascript动态加载注释。您可以使用浏览器上的开发人员工具（适用于Chrome的F12）查看对服务器发出的请求，然后进入“网络”标签页。

请改用这些网址：

http://www.engadget.com/a/hovercard_criteria_comments/?product_id=44337&criteria_id=1

http://www.engadget.com/a/hovercard_criteria_comments/?product_id=44337&criteria_id=2

（等不同的criteria_id）

Python - 点击后加载的数据

1 个答案: