以下代码会导致Pandas引发ValueError。我不确定为什么使用普通列表工作正常。
fileFields = [str(input("Please enter the column name for the pedigree field in
your request file.\n")),
str(input("Please enter the column name for the pedigree field
in the Tissue Library file.\n")),
str(input("Please enter the column name for the sourceID field
in the Tissue Library file.\n")),
str(input("Please enter the column name for the pedigree field in
the Gold Standard file.\n")),
str(input("Please enter the column name for the sourceID field in
the Gold Standard file.\n"))]
dfRequests = pd.read_csv(fileInputs[0], skipinitialspace=True,
usecols=fileFields[0])
dfTissueLibrary = pd.read_csv(fileInputs[1], skipinitialspace=True,
usecols=fileFields[1:2])
dfGoldStandard = pd.read_csv(fileInputs[2], skipinitialspace=True,
usecols=fileFields[3:4])
结果:
Traceback (most recent call last):
File "filepathway hidden for security", line 74, in <module>
usecols=fileFields[0])
File "filepathway hidden for security\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 529, in parser_f
return _read(filepath_or_buffer, kwds)
File "filepathway hidden for security\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 295, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "filepathway hidden for security\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 612, in __init__
self._make_engine(self.engine)
File "filepathway hidden for security\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 747, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "filepathway hidden for security\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1154, in __init__
col_indices.append(self.names.index(u))
ValueError: 'd' is not in list
我觉得好像Pandas从fileFields列表中的每个索引获取字符串并将它们转换为字符串列表。我尝试通过在调用它们之后制作索引字符串列表来解决这个问题,但这不起作用。有什么建议吗?
答案 0 :(得分:1)
有什么建议吗?
我的方法是使用如下的小辅助函数,使过程简单安全:
def selective_read_csv(purpose, path):
# read just the header row and get the column names
columns = list(pd.read_csv(path, nrows=1).columns.values)
df = None
while df is None:
# present user with a selection of actual columns, taking
# out the guess work
file_fields = raw_input("[%s] Enter columns as a comma-separated list %s " % (purpose, columns))
try:
df = pd.read_csv(path, usecols=file_fields.split(','))
except ValueError as e:
print "Sorry, %s" % e
df = None
return df
df = selective_read_csv('requests file', '/tmp/data.csv')
这样就会提示用户使用实际位于文件中的列,并且可以很好地处理错误输入:
[requests file] Enter columns as a comma-spearated list [u'a', u'b'] aaa
Sorry, 'aaa' is not in list
[requests file] Enter columns as a comma-spearated list [u'a', u'b']
然后为每种文件类型调用此函数,例如:
dfRequests = selective_read_csv('requests file', fileInputs[0])
dfTissueLibrary = selective_read_csv('tissue library', fileInputs[1])
dfGoldStandard = selective_read_csv('gold standard', fileInputs[2])