S3读取操作在读取commoncrawl数据时超时

时间:2017-01-02 06:28:33

标签: python amazon-s3 boto common-crawl

为了从常见爬网中读取少量文件,我编写了这个脚本

Traceback (most recent call last):
  File "./warc_mapper_full.py", line 42, in <module>
    for num, record in enumerate(f):
  File "/usr/lib/python2.7/site-packages/warc/warc.py", line 393, in __iter__
    record = self.read_record()
  File "/usr/lib/python2.7/site-packages/warc/warc.py", line 364, in read_record
    self.finish_reading_current_record()
  File "/usr/lib/python2.7/site-packages/warc/warc.py", line 358, in finish_reading_current_record
    self.current_payload.read()
  File "/usr/lib/python2.7/site-packages/warc/utils.py", line 59, in read
    return self._read(self.length)
  File "/usr/lib/python2.7/site-packages/warc/utils.py", line 69, in _read
    content = self.buf + self.fileobj.read(size)
  File "/home/hpcnl/Documents/kics/current_work/aws/tasks/warc-analysis/src/gzipstream/gzipstream/gzipstreamfile.py", line 67, in read
    result = super(GzipStreamFile, self).read(*args, **kwargs)
  File "/home/hpcnl/Documents/kics/current_work/aws/tasks/warc-analysis/src/gzipstream/gzipstream/gzipstreamfile.py", line 48, in readinto
    data = self.read(len(b))
  File "/home/hpcnl/Documents/kics/current_work/aws/tasks/warc-analysis/src/gzipstream/gzipstream/gzipstreamfile.py", line 38, in read
    raw = self.stream.read(io.DEFAULT_BUFFER_SIZE)
  File "/usr/lib/python2.7/site-packages/boto/s3/key.py", line 400, in read
    data = self.resp.read(size)
  File "/usr/lib/python2.7/site-packages/boto/connection.py", line 413, in read
    return http_client.HTTPResponse.read(self, amt)
  File "/usr/lib64/python2.7/httplib.py", line 602, in read
    s = self.fp.read(amt)
  File "/usr/lib64/python2.7/socket.py", line 380, in read
    data = self._sock.recv(left)
  File "/usr/lib64/python2.7/ssl.py", line 736, in recv
    return self.read(buflen)
  File "/usr/lib64/python2.7/ssl.py", line 630, in read
    v = self._sslobj.read(len or 1024)
ssl.SSLError: ('The read operation timed out',)

每行是warc文件的关键字。当我运行此脚本来分析5个文件时,我得到了这个异常

{{1}}

我经常跑了很多次。以上异常每次都发生。问题在哪里?

0 个答案:

没有答案