Question

我对Python和Beautifulsoup相对较新。我需要简单地解析我正在使用urllib2（用于请求和响应）和BeautifulSoup4进行解析的特定请求的响应。

我以前使用过这些没有任何问题。但是，对于这个特殊的项目，我很奇怪地得到错误。

以下是我编写的代码的一部分：

class WebLogin(object):
def __init__(self, username, password, targetSite, loginUrl):
    # url for website we want to log in to
    self.base_url = targetSite;
    self.loginUrl = self.base_url + loginUrl;

    # user supplied username and password
    self.username = username;
    self.password = password;

    # file for storing cookies
    self.cookie_file = 'login.cookies'

    # set up a cookie jar to store cookies
    self.cj = cookielib.MozillaCookieJar(self.cookie_file)
    # set up opener to handle cookies, redirects etc
    self.opener = urllib2.build_opener(
        urllib2.HTTPRedirectHandler(),
        urllib2.HTTPHandler(debuglevel=0),
        urllib2.HTTPSHandler(debuglevel=0),
        urllib2.HTTPCookieProcessor(self.cj)
    );

    # pretend we're a web browser and not a python script
    self.opener.addheaders = [
    ('User-agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:35.0) Gecko/20100101 Firefox/35.0')
    ];

    # open the front page of the website to set and save initial cookies and to retrieve the csrf token (__FK is the csrf token here) required for login
    response = self.opener.open(self.base_url);
    self.cj.save();
    print response.info().get('Content-Encoding');
    initialResponse = response.read();
    print "\nResponse received for \n\n",initialResponse;

    # parsing the response for csrf token
    soup = BeautifulSoup(initialResponse);

现在我正在打印 打印“\ nResponse收到\ n \ n”，initialResponse; 上面，我得到了正确的HTML回复。但是，当我在尝试时 汤= BeautifulSoup（initialResponse）; 上面，我收到以下错误：

Error

请让我知道，这里出了什么问题？我错过了什么？为什么我无法通过response.read（）来制作汤？

我尝试过使用.decode（'utf-8'）以防万一，但这对情况没有帮助。

如果上述快照不清晰，则会再次出现错误：

追踪（最近一次呼叫最后一次）：

文件“flipkartLogin.py”，第64行，in       WebLogin（用户名，密码，targetSite，loginUrl）;

文件“flipkartLogin.py”，第43行，在 init 中       汤= BeautifulSoup（initialResponse）;

文件“/Library/Python/2.7/site-packages/bs4/init.py”，第172行， init       self._feed（）

文件“/Library/Python/2.7/site-packages/bs4/init.py”，第185行，在_feed中       self.builder.feed（self.markup）

文件“/Library/Python/2.7/site-packages/bs4/builder/_htmlparser.py”，第146行，在Feed中       parser.feed（标记）

文件“/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/HTMLParser.py”，第114行，在Feed中       self.goahead（0）

文件“/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/HTMLParser.py”，第158行，在goahead       k = self.parse_starttag（i）

文件“/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/HTMLParser.py”，第324行，在parse_starttag中       self.handle_starttag（tag，attrs）

文件“/Library/Python/2.7/site-packages/bs4/builder/_htmlparser.py”，第48行，在handle_starttag中       self.soup.handle_starttag（name，None，None，dict（attrs））

文件“/Library/Python/2.7/site-packages/bs4/init.py”，第298行，在handle_starttag中       self.currentTag，self.previous_element）

文件“/Library/Python/2.7/site-packages/bs4/element.py”，第749行， init       self.name，attrs）

文件“/Library/Python/2.7/site-packages/bs4/builder/init.py”，第160行，在_replace_cdata_list_attribute_values中       values = whitespace_re.split（value）   TypeError：期望的字符串或缓冲区

使用BeautifulSoup

0 个答案: