我正在为python 2.6中的Hulu编写一个简单的HTML scraper,并且在登录我的帐户时遇到问题。到目前为止,这是我的代码:
import urllib
import urllib2
from cookielib import CookieJar
#make a cookie and redirect handlers
cookies = CookieJar()
cookie_handler= urllib2.HTTPCookieProcessor(cookies)
redirect_handler= urllib2.HTTPRedirectHandler()
opener = urllib2.build_opener(redirect_handler,cookie_handler)#make opener w/ handlers
#build the url
login_info = {'username':USER,'password':PASS}#USER and PASS are defined
data = urllib.urlencode(login_info)
req = urllib2.Request("http://www.hulu.com/account/authenticate",data)#make the request
test = opener.open(req) #open the page
print test.read() #print html results
代码编译并运行,但所有打印的内容都是:
Login.onError("Please \074a href=\"/support/login_faq#cant_login\"\076enable cookies\074/a\076 and try again.");
我认为我如何处理cookie有一些错误,但似乎无法发现它。我听说Mechanize对于这种类型的程序是一个非常有用的模块,但由于这似乎是唯一的减速带,我希望能找到我的bug。