机械化打开函数给出BadStatusLine:''错误

时间:2013-08-18 12:01:13

标签: python-2.7 mechanize-python

我写了这个脚本来下载歌曲的歌词并将它们存储在一个文本文件中:

>>> lis = os.listdir('D:\Phone\Sounds')
>>> for i in lis:
    print i

    br.open('http://www.azlyrics.com/') # THE PROBLEM

    br.select_form(nr=0)
    track = eyed3.load(i).tag
    if(track.artist != None):
        ft = track.artist.find('ft.')
        if(ft != -1):
            br['q'] = track.title + ' ' + track.artist[:ft]
        else:
            br['q'] = track.title + ' ' + track.artist
    else:
        br['q'] = track.title
    br.submit()
    s = BeautifulSoup(br.response().read())
    a = s.find('div',{'class':'sen'})
    if(a != None):
        s = BeautifulSoup(urllib.urlopen(a.find('a')['href']))
        file = open(i.replace('.mp3','.txt'),'w')
        file.write(str(s.find('div',{'style':'margin-left:10px;margin-right:10px;'})).replace('<br />','\n'))
        file.close()
    else:
        print 'Lyrics not found'

这似乎工作了一段时间,我下载了一些歌曲的歌词,突然间它引发了BadStatusLine错误

Heartbreaker.mp3
<response_seek_wrapper at 0x4af6f08L whose wrapped object = <closeable_response at 0x4cb9288L whose fp = <socket._fileobject object at 0x00000000047A2480>>>
<response_seek_wrapper at 0x4b1b888L whose wrapped object = <closeable_response at 0x4cc0048L whose fp = <socket._fileobject object at 0x00000000047A2570>>>
Heartless (The Fray Cover).mp3
<response_seek_wrapper at 0x4b22d08L whose wrapped object = <closeable_response at 0x4b15988L whose fp = <socket._fileobject object at 0x00000000047B2750>>>
<response_seek_wrapper at 0x4cb9388L whose wrapped object = <closeable_response at 0x4b1b448L whose fp = <socket._fileobject object at 0x000000000362AED0>>>
Lyrics not found
Heartless.mp3
<response_seek_wrapper at 0x4cc0288L whose wrapped object = <closeable_response at 0x4b01108L whose fp = <socket._fileobject object at 0x000000000362AE58>>>
<response_seek_wrapper at 0x4b15808L whose wrapped object = <closeable_response at 0x47a4508L whose fp = <socket._fileobject object at 0x000000000362A6D8>>>
Here Without You.mp3
<response_seek_wrapper at 0x4b1b3c8L whose wrapped object = <closeable_response at 0x4916508L whose fp = <socket._fileobject object at 0x000000000362A480>>>
<response_seek_wrapper at 0x47a4fc8L whose wrapped object = <closeable_response at 0x37830c8L whose fp = <socket._fileobject object at 0x000000000362A0C0>>>
Hero.mp3
<response_seek_wrapper at 0x4930408L whose wrapped object = <closeable_response at 0x4cced48L whose fp = <socket._fileobject object at 0x00000000047A2228>>>
<response_seek_wrapper at 0x453ca48L whose wrapped object = <closeable_response at 0x4b23f88L whose fp = <socket._fileobject object at 0x00000000047A2048>>>
Hey Jude.mp3
<response_seek_wrapper at 0x3783808L whose wrapped object = <closeable_response at 0x4cd71c8L whose fp = <socket._fileobject object at 0x00000000047A2A20>>>
<response_seek_wrapper at 0x4ccee48L whose wrapped object = <closeable_response at 0x4cd7c08L whose fp = <socket._fileobject object at 0x00000000047A2B10>>>
Hey, Soul Sister.mp3

Traceback (most recent call last):
  File "<pyshell#23>", line 3, in <module>
    br.open('http://www.azlyrics.com/')
  File "build\bdist.win-amd64\egg\mechanize\_mechanize.py", line 203, in open
    return self._mech_open(url, data, timeout=timeout)
  File "build\bdist.win-amd64\egg\mechanize\_mechanize.py", line 230, in _mech_open
    response = UserAgentBase.open(self, request, data)
  File "build\bdist.win-amd64\egg\mechanize\_opener.py", line 193, in open
    response = urlopen(self, req, data)
  File "build\bdist.win-amd64\egg\mechanize\_urllib2_fork.py", line 344, in _open
    '_open', req)
  File "build\bdist.win-amd64\egg\mechanize\_urllib2_fork.py", line 332, in _call_chain
    result = func(*args)
  File "build\bdist.win-amd64\egg\mechanize\_urllib2_fork.py", line 1142, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "build\bdist.win-amd64\egg\mechanize\_urllib2_fork.py", line 1116, in do_open
    r = h.getresponse()
  File "D:\Programming\Python\lib\httplib.py", line 1027, in getresponse
    response.begin()
  File "D:\Programming\Python\lib\httplib.py", line 407, in begin
    version, status, reason = self._read_status()
  File "D:\Programming\Python\lib\httplib.py", line 371, in _read_status
    raise BadStatusLine(line)
BadStatusLine: ''

那么,为什么br.open函数突然停止工作? 提前谢谢。

1 个答案:

答案 0 :(得分:0)

httplib不理解响应状态代码时,会生成错误。引自docs

  

HTTPException的子类。如果服务器使用HTTP响应,则引发此异常   我们不理解的状态代码。

我在运行br.open('http://www.azlyrics.com/')时没有收到任何错误。所以,问题就在你身边。

您很可能使用代理,请查看Python's mechanize proxy support

UPD: 尝试一下:

br = mechanize.Browser()
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]

br.set_debug_http(True)
br.set_debug_redirects(True)
br.set_debug_responses(True)

br.open('http://www.azlyrics.com')

print br.response().read()

希望有所帮助。