虽然我已经看过一些关于这个主题的文献,但我并不太了解如何实现一个代码块,它会写入大文本文件而不会崩溃。
据我所知,它应该是逐行完成的,但是从我看到的实现只用已经存在的文件来完成,而是我想在块中创建和写入文件,每次迭代都是循环。
这是代码块(它被try catch包围):
fileW = open(str(articleDate.title)+"-WC.txt", 'wb')
fileW.write(getText.encode('utf-8', errors='replace').strip()+ str(articleDate.publish_date).encode('utf-8').strip())
fileW.close()
我知道我需要另一种写入文件的方法的原因是因为我看到这个异常一直被引发,不断弹出的'chunk'关键字表明write()方法无法处理这个数量文字:
File "/Users/Adrian/anaconda3/lib/python3.6/http/client.py", line 546, in _get_chunk_left
chunk_left = self._read_next_chunk_size()
File "/Users/Adrian/anaconda3/lib/python3.6/http/client.py", line 513, in _read_next_chunk_size
return int(line, 16)
ValueError: invalid literal for int() with base 16: b''
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/Adrian/anaconda3/lib/python3.6/http/client.py", line 563, in _readall_chunked
chunk_left = self._get_chunk_left()
File "/Users/Adrian/anaconda3/lib/python3.6/http/client.py", line 548, in _get_chunk_left
raise IncompleteRead(b'')
http.client.IncompleteRead: IncompleteRead(0 bytes read)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "webcrawl.py", line 102, in <module>
writeFiles()
File "webcrawl.py", line 83, in writeFiles
extractor = Extractor(extractor='ArticleExtractor', url=urls)
File "/Users/Adrian/anaconda3/lib/python3.6/site-packages/boilerpipe/extract/__init__.py", line 39, in __init__
connection = urllib2.urlopen(request)
File "/Users/Adrian/anaconda3/lib/python3.6/urllib/request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "/Users/Adrian/anaconda3/lib/python3.6/urllib/request.py", line 532, in open
response = meth(req, response)
File "/Users/Adrian/anaconda3/lib/python3.6/urllib/request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "/Users/Adrian/anaconda3/lib/python3.6/urllib/request.py", line 564, in error
result = self._call_chain(*args)
File "/Users/Adrian/anaconda3/lib/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
File "/Users/Adrian/anaconda3/lib/python3.6/urllib/request.py", line 753, in http_error_302
fp.read()
File "/Users/Adrian/anaconda3/lib/python3.6/http/client.py", line 456, in read
return self._readall_chunked()
File "/Users/Adrian/anaconda3/lib/python3.6/http/client.py", line 570, in _readall_chunked
raise IncompleteRead(b''.join(value))
http.client.IncompleteRead: IncompleteRead(0 bytes read)
虽然我知道底部的异常名称通常是由于库名'httplibs'从'python 2'更改为'urllibs'而发生的,但是我正在使用的包装符合python 3,所以我相当肯定这是一个写作问题,任何帮助将不胜感激。
答案 0 :(得分:1)
您可以使用上下文管理器确保在每个操作结束时关闭文件:
import contextlib
@contextlib.contextmanager
def write_to(filename, ops = 'a'):
f = open(filename, ops)
yield f
f.close()
for chunk in data:
with write_to('filename.txt') as f:
f.write(chunk)