我写了这个脚本来下载歌曲的歌词并将它们存储在一个文本文件中:
>>> lis = os.listdir('D:\Phone\Sounds')
>>> for i in lis:
print i
br.open('http://www.azlyrics.com/') # THE PROBLEM
br.select_form(nr=0)
track = eyed3.load(i).tag
if(track.artist != None):
ft = track.artist.find('ft.')
if(ft != -1):
br['q'] = track.title + ' ' + track.artist[:ft]
else:
br['q'] = track.title + ' ' + track.artist
else:
br['q'] = track.title
br.submit()
s = BeautifulSoup(br.response().read())
a = s.find('div',{'class':'sen'})
if(a != None):
s = BeautifulSoup(urllib.urlopen(a.find('a')['href']))
file = open(i.replace('.mp3','.txt'),'w')
file.write(str(s.find('div',{'style':'margin-left:10px;margin-right:10px;'})).replace('<br />','\n'))
file.close()
else:
print 'Lyrics not found'
这似乎工作了一段时间,我下载了一些歌曲的歌词,突然间它引发了BadStatusLine错误
Heartbreaker.mp3
<response_seek_wrapper at 0x4af6f08L whose wrapped object = <closeable_response at 0x4cb9288L whose fp = <socket._fileobject object at 0x00000000047A2480>>>
<response_seek_wrapper at 0x4b1b888L whose wrapped object = <closeable_response at 0x4cc0048L whose fp = <socket._fileobject object at 0x00000000047A2570>>>
Heartless (The Fray Cover).mp3
<response_seek_wrapper at 0x4b22d08L whose wrapped object = <closeable_response at 0x4b15988L whose fp = <socket._fileobject object at 0x00000000047B2750>>>
<response_seek_wrapper at 0x4cb9388L whose wrapped object = <closeable_response at 0x4b1b448L whose fp = <socket._fileobject object at 0x000000000362AED0>>>
Lyrics not found
Heartless.mp3
<response_seek_wrapper at 0x4cc0288L whose wrapped object = <closeable_response at 0x4b01108L whose fp = <socket._fileobject object at 0x000000000362AE58>>>
<response_seek_wrapper at 0x4b15808L whose wrapped object = <closeable_response at 0x47a4508L whose fp = <socket._fileobject object at 0x000000000362A6D8>>>
Here Without You.mp3
<response_seek_wrapper at 0x4b1b3c8L whose wrapped object = <closeable_response at 0x4916508L whose fp = <socket._fileobject object at 0x000000000362A480>>>
<response_seek_wrapper at 0x47a4fc8L whose wrapped object = <closeable_response at 0x37830c8L whose fp = <socket._fileobject object at 0x000000000362A0C0>>>
Hero.mp3
<response_seek_wrapper at 0x4930408L whose wrapped object = <closeable_response at 0x4cced48L whose fp = <socket._fileobject object at 0x00000000047A2228>>>
<response_seek_wrapper at 0x453ca48L whose wrapped object = <closeable_response at 0x4b23f88L whose fp = <socket._fileobject object at 0x00000000047A2048>>>
Hey Jude.mp3
<response_seek_wrapper at 0x3783808L whose wrapped object = <closeable_response at 0x4cd71c8L whose fp = <socket._fileobject object at 0x00000000047A2A20>>>
<response_seek_wrapper at 0x4ccee48L whose wrapped object = <closeable_response at 0x4cd7c08L whose fp = <socket._fileobject object at 0x00000000047A2B10>>>
Hey, Soul Sister.mp3
Traceback (most recent call last):
File "<pyshell#23>", line 3, in <module>
br.open('http://www.azlyrics.com/')
File "build\bdist.win-amd64\egg\mechanize\_mechanize.py", line 203, in open
return self._mech_open(url, data, timeout=timeout)
File "build\bdist.win-amd64\egg\mechanize\_mechanize.py", line 230, in _mech_open
response = UserAgentBase.open(self, request, data)
File "build\bdist.win-amd64\egg\mechanize\_opener.py", line 193, in open
response = urlopen(self, req, data)
File "build\bdist.win-amd64\egg\mechanize\_urllib2_fork.py", line 344, in _open
'_open', req)
File "build\bdist.win-amd64\egg\mechanize\_urllib2_fork.py", line 332, in _call_chain
result = func(*args)
File "build\bdist.win-amd64\egg\mechanize\_urllib2_fork.py", line 1142, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "build\bdist.win-amd64\egg\mechanize\_urllib2_fork.py", line 1116, in do_open
r = h.getresponse()
File "D:\Programming\Python\lib\httplib.py", line 1027, in getresponse
response.begin()
File "D:\Programming\Python\lib\httplib.py", line 407, in begin
version, status, reason = self._read_status()
File "D:\Programming\Python\lib\httplib.py", line 371, in _read_status
raise BadStatusLine(line)
BadStatusLine: ''
那么,为什么br.open函数突然停止工作? 提前谢谢。
答案 0 :(得分:0)
当httplib
不理解响应状态代码时,会生成错误。引自docs:
HTTPException的子类。如果服务器使用HTTP响应,则引发此异常 我们不理解的状态代码。
我在运行br.open('http://www.azlyrics.com/')
时没有收到任何错误。所以,问题就在你身边。
您很可能使用代理,请查看Python's mechanize proxy support。
UPD: 尝试一下:
br = mechanize.Browser()
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
br.set_debug_http(True)
br.set_debug_redirects(True)
br.set_debug_responses(True)
br.open('http://www.azlyrics.com')
print br.response().read()
希望有所帮助。