将read_csv与自制对象一起用作'文件'

时间:2017-10-24 20:50:19

标签: python pandas

read_csv doc说它的第一个参数可以是'任何带有read()方法的对象(例如文件句柄或StringIO)'。我的问题是如何构建一个能够以此身份运作的对象。

import pandas as pd

file_name = 'plain.txt'

class FileWrap:
    def __init__(self, path):
        self.file = open(path)
    def read(self):
        return self.file.readline().rstrip()

filewrap = FileWrap(file_name)

while True:
    line = filewrap.read()
    if not line:
        break
    print (line)

df = pd.read_csv(FileWrap(file_name), header=None)
print (df)

此脚本的输出是这个。

前三行只是为了表明FileWrap对象的read方法似乎按预期运行。其余的行用于显示我不理解使用read方法构造对象,pandas可以使用该方法一次接收一行输入。 read必须做些什么才能让大熊猫开心?

1,2,3
4,5,6
7,8,9
Traceback (most recent call last):
  File "temp.py", line 20, in <module>
    df = pd.read_csv(FileWrap(file_name), header=None)
  File "C:\Python34\lib\site-packages\pandas\io\parsers.py", line 645, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "C:\Python34\lib\site-packages\pandas\io\parsers.py", line 388, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "C:\Python34\lib\site-packages\pandas\io\parsers.py", line 729, in __init__
    self._make_engine(self.engine)
  File "C:\Python34\lib\site-packages\pandas\io\parsers.py", line 922, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "C:\Python34\lib\site-packages\pandas\io\parsers.py", line 1389, in __init__
    self._reader = _parser.TextReader(src, **kwds)
  File "pandas\parser.pyx", line 535, in pandas.parser.TextReader.__cinit__ (pandas\parser.c:6077)
  File "pandas\parser.pyx", line 797, in pandas.parser.TextReader._get_header (pandas\parser.c:9878)
  File "pandas\parser.pyx", line 909, in pandas.parser.TextReader._tokenize_rows (pandas\parser.c:11257)
  File "pandas\parser.pyx", line 2008, in pandas.parser.raise_parser_error (pandas\parser.c:26804)
TypeError: raise: exception class must be a subclass of BaseException

1 个答案:

答案 0 :(得分:2)

当pandas调用check is_file_like时,对象有read__iter__方法is_file_like无效,所以你可以尝试:

import pandas as pd

file_name = 'plain.txt'

class FileWrap:
    def __init__(self, path):
        self.file = open(path)
    def __iter__(self):
        self.file.readline().rstrip()
    def read(self, *args, **kwargs):
        return self.file.read()

df = pd.read_csv(FileWrap(file_name), header=None)
print (df)