Question

我正在尝试为一家名为derstandard.at的奥地利报纸的留言板编写一个webcrawler。我对想要对用户进行网络分析的交互感兴趣。我能够检索到我想要的所有东西但是当它来改变留言板页面时它根本不起作用。

使用firefox我可以通过更改网址中的一个数字来访问我想要的页面，例如第5页

http://derstandard.at/1345164506806/Umfrage-FPOe-auf-tiefstem-Stand-seit-mehr-als-zwei-Jahren?seite=5#forumstart

当我尝试从我的python脚本中访问它时，我总是得到第1页。

首先，我认为这是因为我的用户代理，但我将其更改为我的firefox用户代理，仍然总是得到第1页。为什么会这样？

这是相关的代码段：

#!/usr/bin/python
# -*- coding: utf-8 -*-
import urllib
from BeautifulSoup import BeautifulSoup

from urllib import FancyURLopener
class MyOpener(FancyURLopener):
    version = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:14.0) Gecko/20100101 Firefox/14.0.1'

f_open=MyOpener()

page=BeautifulSoup(f_open.open('http://derstandard.at/1345164506806/Umfrage-FPOe-auf-tiefstem-Stand-seit-mehr-als-zwei-Jahren?seite=5#forumstart'))

打印页面

Answer 1

根据OP。我对他的评论解决了这个问题。

我的评论：

也许是“＃”我听说它有时会导致错误，放一个搜索字符串的开头。喜欢 r'http：//derstandard.at/1345164506806/Umfrage-FPOe-auf-tiefstem-Stand-seit-mehr -ALS-ZWEI-Jahren seite = 5＃forumstart'

所以这似乎是一个简单的错误。

Python urllib在firefox中有不同的结果

1 个答案: