扫描目录树并使用Python将.csv文件读取到数据框中

时间:2017-08-06 17:45:40

标签: python directory-structure os.walk

我正在尝试走一个目录树,并且对于在walk上遇到的每个csv,我想打开文件并将第0列和第15列读入数据框(之后我将处理并移动到下一个文件。我可以使用以下内容遍历目录树:

rootdir = r'C:/Users/stacey/Documents/Alco/auditopt/'
for dirName,sundirList, fileList in os.walk(rootdir):
         print('Found directory: %s' % dirName)
         for fname in fileList:
             print('\t%s' % fname)
             df = pd.read_csv(fname, header=1, usecols=[0,15],parse_dates=[0], dayfirst=True,index_col=[0], names=['date', 'total_pnl_per_pos'])
             print(df)

但我收到错误消息:

FileNotFoundError: File b'auditopt.os-pnl.BBG_XASX_ARB_S-BBG_XTKS_7240_S.csv' does not exist.

我正在尝试读取确实存在的文件。它们采用MS Excel .csv格式,因此我不知道这是否是一个问题 - 如果是的话,有人会告诉我如何将MS Excel .csv读入数据框中。

完整堆栈跟踪如下:

Found directory: C:/Users/stacey/Documents/Alco/auditopt/
Found directory: C:/Users/stacey/Documents/Alco/auditopt/roll_597_oe_2017-03-10
        tradeopt.os-pnl.BBG_XASX_ARB_S-BBG_XTKS_7240_S.csv
Traceback (most recent call last):

  File "<ipython-input-24-3753e367432d>", line 1, in <module>
    runfile('C:/Users/stacey/Documents/scripts/Pair_Results_Code_1.0.py', wdir='C:/Users/stacey/Documents/scripts')

  File "C:\Anaconda\lib\site-packages\spyder\utils\site\sitecustomize.py", line 866, in runfile
    execfile(filename, namespace)

  File "C:\Anaconda\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Users/stacey/Documents/scripts/Pair_Results_Code_1.0.py", line 49, in <module>
    main()

  File "C:/Users/stacey/Documents/scripts/Pair_Results_Code_1.0.py", line 36, in main
    df = pd.read_csv(fname, header=1, usecols=[0,15],parse_dates=[0], dayfirst=True,index_col=[0], names=['date', 'total_pnl_per_pos'])

  File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 646, in parser_f
    return _read(filepath_or_buffer, kwds)

  File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 389, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)

  File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 730, in __init__
    self._make_engine(self.engine)

  File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 923, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)

  File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 1390, in __init__
    self._reader = _parser.TextReader(src, **kwds)

  File "pandas\parser.pyx", line 373, in pandas.parser.TextReader.__cinit__ (pandas\parser.c:4184)

  File "pandas\parser.pyx", line 667, in pandas.parser.TextReader._setup_parser_source (pandas\parser.c:8449)

FileNotFoundError: File b'tradeopt.os-pnl.BBG_XASX_ARB_S-BBG_XTKS_7240_S.csv' does not exist

1 个答案:

答案 0 :(得分:1)

在文件中阅读时,您需要提供完整路径。默认情况下,os.walk不提供完整路径。你需要自己提供它。

使用os.path.join简化此操作。

import os
full_path = os.path.join(dirName, file)
df = pd.read_csv(full_path, ...)