urllib2,python,打开特定网站时的垃圾响应

时间:2014-02-07 00:06:22

标签: python urllib2

所以我一直在环顾四周,并设法凑齐一些代码,让我登录网站,http://forums.somethingawful.com

它有效,我可以从响应中看出它有效。

当我尝试使用我为上述登录创建的相同的urllib2开启者时,访问网站http://forums.somethingawful.com/attachment.php?attachmentid=300的这一部分(我需要登录才能查看)才能打开此页面,我得到了一个回应“ÿØÿà”

编辑:http://i.imgur.com/PmWl1s4.png

我已经包含了登录时目标网页的截图,如果这是帮助

任何想法为什么?

"""
# Script to log in to website and store cookies. 
# run as: python web_login.py USERNAME PASSWORD
#
# sources of code include:
# 
# http://stackoverflow.com/questions/2954381/python-form-post-using-urllib2-also-question-on-saving-using-cookies
# http://stackoverflow.com/questions/301924/python-urllib-urllib2-httplib-confusion
# http://www.voidspace.org.uk/python/articles/cookielib.shtml
#
# mashed together by Martin Chorley
# 
# Licensed under a Creative Commons Attribution ShareAlike 3.0 Unported License.
# http://creativecommons.org/licenses/by-sa/3.0/
"""

import urllib, urllib2
import cookielib
import sys

import urlparse
from BeautifulSoup import BeautifulSoup as bs

class WebLogin(object):

    def __init__(self, username, password):

        # url for website we want to log in to
        self.base_url = 'http://forums.somethingawful.com/'
        # login action we want to post data to
        # could be /login or /account/login or something similar
        self.login_action = '/account.php?'
        # file for storing cookies
        self.cookie_file = 'login.cookies'

        # user provided username and password
        self.username = username
        self.password = password

        # set up a cookie jar to store cookies
        self.cj = cookielib.MozillaCookieJar(self.cookie_file)

        # set up opener to handle cookies, redirects etc
        self.opener = urllib2.build_opener(
            urllib2.HTTPRedirectHandler(),
            urllib2.HTTPHandler(debuglevel=0),
            urllib2.HTTPSHandler(debuglevel=0),
            urllib2.HTTPCookieProcessor(self.cj)
        )

        # pretend we're a web browser and not a python script
        self.opener.addheaders = [('User-agent', 
            ('Chrome/16.0.912.77'))
        ]

        # open the front page of the website to set and save initial cookies
        response = self.opener.open(self.base_url)
        self.cj.save()

        # try and log in to the site
        response = self.login()

        response2 = self.opener.open("http://forums.somethingawful.com/attachment.php?attachmentid=300")

        print response2.read() + "LLLLLL"

    # method to do login
    def login(self):

        # parameters for login action
        # may be different for different websites
        # check html source of website for specifics
        login_data = urllib.urlencode({
              'action': 'login',
              'username': 'username',
              'password': 'password'
        })

        # construct the url
        login_url = self.base_url + self.login_action
        # then open it
        response = self.opener.open(login_url, login_data)
        # save the cookies and return the response
        self.cj.save()
        return response

if __name__ == "__main__":

    username = "username"
    password = "password"

    # initialise and login to the website
    test = WebLogin(username, password)

1 个答案:

答案 0 :(得分:1)

请改为尝试:

import urllib2,cookielib

def login(username,password):
    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookielib.CookieJar()))
    url1 = "http://forums.somethingawful.com/attachment.php?attachmentid=300"
    url2 = "http://forums.somethingawful.com/account.php?action=loginform"
    data = "&username="+username+"&password="+password
    socket = opener.open(url1)
    socket = opener.open(url2,data)
    return socket.read()

P.S。:我把它写成一个独立的功能;如果适合您,您可以将它集成到您​​的课程中。此外,对opener.open(url1)的调用可能是多余的;需要一对有效的用户名/密码才能验证......