为了从常见爬网中读取少量文件,我编写了这个脚本
Traceback (most recent call last):
File "./warc_mapper_full.py", line 42, in <module>
for num, record in enumerate(f):
File "/usr/lib/python2.7/site-packages/warc/warc.py", line 393, in __iter__
record = self.read_record()
File "/usr/lib/python2.7/site-packages/warc/warc.py", line 364, in read_record
self.finish_reading_current_record()
File "/usr/lib/python2.7/site-packages/warc/warc.py", line 358, in finish_reading_current_record
self.current_payload.read()
File "/usr/lib/python2.7/site-packages/warc/utils.py", line 59, in read
return self._read(self.length)
File "/usr/lib/python2.7/site-packages/warc/utils.py", line 69, in _read
content = self.buf + self.fileobj.read(size)
File "/home/hpcnl/Documents/kics/current_work/aws/tasks/warc-analysis/src/gzipstream/gzipstream/gzipstreamfile.py", line 67, in read
result = super(GzipStreamFile, self).read(*args, **kwargs)
File "/home/hpcnl/Documents/kics/current_work/aws/tasks/warc-analysis/src/gzipstream/gzipstream/gzipstreamfile.py", line 48, in readinto
data = self.read(len(b))
File "/home/hpcnl/Documents/kics/current_work/aws/tasks/warc-analysis/src/gzipstream/gzipstream/gzipstreamfile.py", line 38, in read
raw = self.stream.read(io.DEFAULT_BUFFER_SIZE)
File "/usr/lib/python2.7/site-packages/boto/s3/key.py", line 400, in read
data = self.resp.read(size)
File "/usr/lib/python2.7/site-packages/boto/connection.py", line 413, in read
return http_client.HTTPResponse.read(self, amt)
File "/usr/lib64/python2.7/httplib.py", line 602, in read
s = self.fp.read(amt)
File "/usr/lib64/python2.7/socket.py", line 380, in read
data = self._sock.recv(left)
File "/usr/lib64/python2.7/ssl.py", line 736, in recv
return self.read(buflen)
File "/usr/lib64/python2.7/ssl.py", line 630, in read
v = self._sslobj.read(len or 1024)
ssl.SSLError: ('The read operation timed out',)
每行是warc文件的关键字。当我运行此脚本来分析5个文件时,我得到了这个异常
{{1}}
我经常跑了很多次。以上异常每次都发生。问题在哪里?