Question

以下是我想要抓取的网站http://www.quickbid.com.tw/

我希望我可以将class =“timestamp”变成python中的变量，这样我就可以按照自己喜欢的方式解析“timestamp”。

我尝试过使用scrapy来抓取“timestamp”，但由于scrapy不支持javascript生成的数据，我无法得到它。

我还尝试使用firebug来监视“quickbid”和我的浏览器之间传输的数据包。我发现每秒都有数据包被传输以便同步时间戳。但我仍然不知道这些数据包是如何生成的。我听说Selenium可以帮我实现目标。但在阅读了Selenium（http://www.jroller.com/selenium/）的教程之后，我仍然无法获得如何抓取我想要的数据的线索。

有谁知道如何从这个网站获取数据？任何帮助将不胜感激。

Answer 1

我通常使用基本请求和BeautifulSoup库进行报废。我这样做了：

import requests
from bs4 import BeautifulSoup
r = requests.get("http://www.quickbid.com.tw/")
c = r.content
soup = BeautifulSoup(c,'html.parser')
timestanp = soup.findAll('span',{'class':'timestamp'})
print timestanp

它返回了：

[<span class="timestamp">Save91%</span>, <span class="timestamp">Save84%</span>, <span class="timestamp">Save96%</span>, <span class="timestamp">Save99%</span>, <span class="timestamp">Save82%</span>]

希望这就是你要找的东西。

Answer 2

你绝对可以用Selenium来做。事实上这很容易。 Selenium有许多不同编程语言的插件，所以只需选择你更熟悉的编程语言，并阅读该特定语言的Selenium文档。

我个人使用python，这很容易理解。

这是selenium documentation for Python。

Answer 3

我最终使用名为Greasemonkey的Firefox附加组件来抓取网站。

https://addons.mozilla.org/en-US/firefox/addon/greasemonkey/

Greasemonkey可以在http://www.quickbid.com.tw/

中捕获动态生成的数据

如何在本网站中抓取动态生成的数据？

3 个答案: