环境:
Package Version
----------------------------- ---------
backports.functools-lru-cache 1.5
beautifulsoup4 4.6.0
certifi 2018.4.16
chardet 3.0.4
cycler 0.10.0
Django 1.11.14
et-xmlfile 1.0.1
future 0.16.0
googlemaps 2.5.1
idna 2.6
jdcal 1.4
jenkinsapi 0.3.6
Jinja2 2.10
kiwisolver 1.0.1
lxml 4.2.1
MarkupSafe 1.0
matplotlib 2.2.2
numpy 1.14.3
openpyxl 2.5.3
pandas 0.23.0
pip 10.0.1
psycopg2 2.7.5
pymongo 3.7.0
pyparsing 2.2.0
python-dateutil 2.7.3
pytz 2018.4
PyYAML 3.12
requests 2.18.4
scipy 1.1.0
seaborn 0.8.1
selenium 3.12.0
setuptools 18.2
six 1.11.0
urllib3 1.22
web.py 0.40.dev1
wheel 0.31.1
xmldiff 1.1.1
操作系统:Windows 10
Python 2.7
说明:
当我通过方法read_csv
读取大型csv文件test.csv(1.15G)时,python会抛出一些异常。
以下代码:
file =os.path.join(DATA_PATH, "test.csv")
test_chunks = pd.read_csv(file,iterator=True, engine="python",error_bad_lines=False, sep=',')
test_chunk = test_chunks.get_chunk(5)
解释器抛出一些异常:
Traceback (most recent call last):
File "D:/work_code/QA_tools/autogencases/utils/csvReader.py", line 149, in <module>
error_bad_lines=False, sep=',')
File "C:\Python27\lib\site-packages\pandas\io\parsers.py", line 678, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\Python27\lib\site-packages\pandas\io\parsers.py", line 440, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "C:\Python27\lib\site-packages\pandas\io\parsers.py", line 787, in __init__
self._make_engine(self.engine)
File "C:\Python27\lib\site-packages\pandas\io\parsers.py", line 1024, in _make_engine
self._engine = klass(self.f, **self.options)
File "C:\Python27\lib\site-packages\pandas\io\parsers.py", line 2089, in __init__
self.columns, self.num_original_columns = self._infer_columns()
File "C:\Python27\lib\site-packages\pandas\io\parsers.py", line 2359, in _infer_columns
line = self._buffered_line()
File "C:\Python27\lib\site-packages\pandas\io\parsers.py", line 2530, in _buffered_line
return self._next_line()
File "C:\Python27\lib\site-packages\pandas\io\parsers.py", line 2635, in _next_line
orig_line = self._next_iter_line(row_num=self.pos + 1)
File "C:\Python27\lib\site-packages\pandas\io\parsers.py", line 2695, in _next_iter_line
return next(self.data)
IOError: [Errno 13] Permission denied
但是我检查了test.csv的权限是否可以,并且其父路径的权限也可以。同时,同一文件夹中还有另一个csv文件,该文件具有相同的权限,可以正确读取该权限,唯一的区别是只有135M。