read_csv
doc说它的第一个参数可以是'任何带有read()方法的对象(例如文件句柄或StringIO)'。我的问题是如何构建一个能够以此身份运作的对象。
import pandas as pd
file_name = 'plain.txt'
class FileWrap:
def __init__(self, path):
self.file = open(path)
def read(self):
return self.file.readline().rstrip()
filewrap = FileWrap(file_name)
while True:
line = filewrap.read()
if not line:
break
print (line)
df = pd.read_csv(FileWrap(file_name), header=None)
print (df)
此脚本的输出是这个。
前三行只是为了表明FileWrap
对象的read
方法似乎按预期运行。其余的行用于显示我不理解使用read
方法构造对象,pandas可以使用该方法一次接收一行输入。 read
必须做些什么才能让大熊猫开心?
1,2,3
4,5,6
7,8,9
Traceback (most recent call last):
File "temp.py", line 20, in <module>
df = pd.read_csv(FileWrap(file_name), header=None)
File "C:\Python34\lib\site-packages\pandas\io\parsers.py", line 645, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\Python34\lib\site-packages\pandas\io\parsers.py", line 388, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "C:\Python34\lib\site-packages\pandas\io\parsers.py", line 729, in __init__
self._make_engine(self.engine)
File "C:\Python34\lib\site-packages\pandas\io\parsers.py", line 922, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "C:\Python34\lib\site-packages\pandas\io\parsers.py", line 1389, in __init__
self._reader = _parser.TextReader(src, **kwds)
File "pandas\parser.pyx", line 535, in pandas.parser.TextReader.__cinit__ (pandas\parser.c:6077)
File "pandas\parser.pyx", line 797, in pandas.parser.TextReader._get_header (pandas\parser.c:9878)
File "pandas\parser.pyx", line 909, in pandas.parser.TextReader._tokenize_rows (pandas\parser.c:11257)
File "pandas\parser.pyx", line 2008, in pandas.parser.raise_parser_error (pandas\parser.c:26804)
TypeError: raise: exception class must be a subclass of BaseException
答案 0 :(得分:2)
当pandas调用check is_file_like
时,对象有read
和__iter__
方法is_file_like无效,所以你可以尝试:
import pandas as pd
file_name = 'plain.txt'
class FileWrap:
def __init__(self, path):
self.file = open(path)
def __iter__(self):
self.file.readline().rstrip()
def read(self, *args, **kwargs):
return self.file.read()
df = pd.read_csv(FileWrap(file_name), header=None)
print (df)