Question

我正在尝试检索这些结果的第二页：

http://raceresults.sportstats.ca/display-results.xhtml?raceid=451

如果我单击底部的第2页，它将转到第2页，但URL保持不变。如果我查看http标头，我可以看到这个cookie：

Set-Cookie: sportstats_preferences="{\"raceId\":451,\"firstRow\":40,
\"category\":\"All Categories\",\"chronosStep\":\"INSTRUCTIONS
\",\"facebookLoggedIn\":false,\"twitterLoggedIn\":false,\"fbServiceId
\":0,\"twServiceId\":0,\"unit\":1}"; Version=1; Max-Age=2592000; 
Expires=Sat, 04-Apr-2015 14:30:28 GMT

我可以看到这与firstRow设置为40的第一页不同。

我正在尝试使用以下代码在Python 3中获取第2页：

#!/usr/bin/env python
import urllib.request
opener = urllib.request.build_opener()
cookie = 'sportstats_preferences="{{\\"raceId\\":451,\\"firstRow\\":40,\\"category\\":\\"All Categories\\",\\"chronosStep\\":\\"INSTRUCTIONS\\",\\"facebookLoggedIn\\":false,\\"twitterLoggedIn\\":false,\\"fbServiceId\\":0,\\"twServiceId\\":0,\\"unit\\":1}}"; Version=1; Max-Age=2592000; Expires=Sat, 04-Apr-2015 04:18:36 GMT'
opener.addheaders = [('Cookie', cookie)]
f = opener.open(url).read().decode("utf-8")
for line in f.splitlines():
    print(line)

但是这仍然只是返回第一页的结果。我是以正确的方式来做这件事的吗？我有什么想法可以得到第二页的结果吗？

Answer 1

您最好的选择可能是使用Selenium和相应的python包。 Selenium允许您使用python打开并自动控制Web浏览器。这将允许您与其下一页按钮进行交互，并以python脚本读取结果。

http://www.seleniumhq.org/

https://pypi.python.org/pypi/selenium

Python 3 urllib.request发送cookie，获取结果

1 个答案: