尝试使用Urllib打开页面时出错。 (python3)

时间:2018-09-28 17:30:13

标签: python python-3.x web-scraping urllib

我在抓取时循环浏览一组页面,但是特定的页面不断出现错误,尽管其他页面工作正常。

为了测试有问题的页面,我写了:

class AppURLopener(uReq.FancyURLopener):
    version = "Mozilla/5.0"

url = 'http://snesguide.com/genre/beatemup/2'
opener = AppURLopener()


try:
    page_html = opener.open(url).read()
except:
    print("Error loading webpage")

url = 'http://snesguide.com/genre/beatemup/1'有效

url = 'http://snesguide.com/genre/beatemup/3'也可以

url = 'http://snesguide.com/genre/beatemup/2'引发以下内容:

(我应该提到这三个功能都可以在浏览器中使用!)

Traceback (most recent call last):
  File "C:\Users\PHIL\AppData\Local\Programs\Python\Python36-32\lib\http\client.py", line 546, in _get_chunk_left
    chunk_left = self._read_next_chunk_size()
  File "C:\Users\PHIL\AppData\Local\Programs\Python\Python36-32\lib\http\client.py", line 513, in _read_next_chunk_size
    return int(line, 16)
ValueError: invalid literal for int() with base 16: b''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\PHIL\AppData\Local\Programs\Python\Python36-32\lib\http\client.py", line 563, in _readall_chunked
    chunk_left = self._get_chunk_left()
  File "C:\Users\PHIL\AppData\Local\Programs\Python\Python36-32\lib\http\client.py", line 548, in _get_chunk_left
    raise IncompleteRead(b'')
http.client.IncompleteRead: IncompleteRead(0 bytes read)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:/Users/PHIL/Desktop/pyscripts/web-scraping/NES_DB_Scrape/scrape webpage connection test.py", line 17, in <module>
    page_html = opener.open(url).read()
  File "C:\Users\PHIL\AppData\Local\Programs\Python\Python36-32\lib\tempfile.py", line 483, in func_wrapper
    return func(*args, **kwargs)
  File "C:\Users\PHIL\AppData\Local\Programs\Python\Python36-32\lib\http\client.py", line 456, in read
    return self._readall_chunked()
  File "C:\Users\PHIL\AppData\Local\Programs\Python\Python36-32\lib\http\client.py", line 570, in _readall_chunked
    raise IncompleteRead(b''.join(value))
http.client.IncompleteRead: IncompleteRead(29049 bytes read)

有什么想法吗?

0 个答案:

没有答案