我在抓取时循环浏览一组页面,但是特定的页面不断出现错误,尽管其他页面工作正常。
为了测试有问题的页面,我写了:
class AppURLopener(uReq.FancyURLopener):
version = "Mozilla/5.0"
url = 'http://snesguide.com/genre/beatemup/2'
opener = AppURLopener()
try:
page_html = opener.open(url).read()
except:
print("Error loading webpage")
url = 'http://snesguide.com/genre/beatemup/1'
有效
url = 'http://snesguide.com/genre/beatemup/3'
也可以
url = 'http://snesguide.com/genre/beatemup/2'
引发以下内容:
(我应该提到这三个功能都可以在浏览器中使用!)
Traceback (most recent call last):
File "C:\Users\PHIL\AppData\Local\Programs\Python\Python36-32\lib\http\client.py", line 546, in _get_chunk_left
chunk_left = self._read_next_chunk_size()
File "C:\Users\PHIL\AppData\Local\Programs\Python\Python36-32\lib\http\client.py", line 513, in _read_next_chunk_size
return int(line, 16)
ValueError: invalid literal for int() with base 16: b''
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\PHIL\AppData\Local\Programs\Python\Python36-32\lib\http\client.py", line 563, in _readall_chunked
chunk_left = self._get_chunk_left()
File "C:\Users\PHIL\AppData\Local\Programs\Python\Python36-32\lib\http\client.py", line 548, in _get_chunk_left
raise IncompleteRead(b'')
http.client.IncompleteRead: IncompleteRead(0 bytes read)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/Users/PHIL/Desktop/pyscripts/web-scraping/NES_DB_Scrape/scrape webpage connection test.py", line 17, in <module>
page_html = opener.open(url).read()
File "C:\Users\PHIL\AppData\Local\Programs\Python\Python36-32\lib\tempfile.py", line 483, in func_wrapper
return func(*args, **kwargs)
File "C:\Users\PHIL\AppData\Local\Programs\Python\Python36-32\lib\http\client.py", line 456, in read
return self._readall_chunked()
File "C:\Users\PHIL\AppData\Local\Programs\Python\Python36-32\lib\http\client.py", line 570, in _readall_chunked
raise IncompleteRead(b''.join(value))
http.client.IncompleteRead: IncompleteRead(29049 bytes read)
有什么想法吗?