在Python熊猫中循环CSV Concat

时间:2015-09-04 04:09:12

标签: python-2.7 csv pandas

我有多个文件夹,每个文件夹都包含csvs。我试图在每个子目录中连接csvs然后导出它。最后,我将拥有与文件夹相同数量的输出。最后我想有Folder1.csv,Folder2.csv,... Folder99.csv等。这是什么

import os
from glob import glob
import pandas as pd
import numpy as np



rootDir = 'D:/Data'
OutDirectory = 'D:/OutPut'
os.chdir(rootDir)

# The directory has folders as follows
# D:/Data/Folder1
# D:/Data/Folder2
# D:/Data/Folder3
# ....
# .....
# D:/Data/Folder99

# Each folders (Folder1, Folder2,..etc.) has many csvs.

frame = pd.DataFrame()
list_ = []
for (dirname, dirs, files) in os.walk(rootDir):
for filename in files:
    if filename.endswith('.csv'):
        df = pd.read_csv(filename,index_col=None, na_values=['-999'], delim_whitespace= True, header = 0,  skiprows = 2)
        OutFile = '%s.csv' % OutputFname
        list_.append(df)
        frame = pd.concat(list_)

        df.to_csv(OutDirectory+OutFile, sep = ',', header= True)

我收到以下错误:

IOError: File file200150101.csv does not exist

1 个答案:

答案 0 :(得分:1)

您需要连接dirname和filename以获取文件的完整路径。像这样改变这一行:

df = pd.read_csv(os.path.join(dirname, filename) ,index_col=None, na_values=['-999'], delim_whitespace= True, header = 0, skiprows = 2)

修改
我不知道熊猫是如何起作用的,因为我从未使用它。但我认为你的问题是,你在内循环中定义了你想要完成的所有内容,只能循环文件(至少缩进看起来那样 - 但这也可能是你粘贴时出现的格式问题你的代码在SO)。

我重写了你的代码并修复了一些我认为可能是问题的东西:

  • 首先,我用大字母重命名你的变量,因为,
    对我来说,拥有大字母的vars总是很奇怪。
  • 我将列表变量移动到外部循环,因为它应该是
    每次输入新目录时都会重置,因为您需要所有CSV 合并每个文件夹
  • 最后,我修复了缩进。在python缩进中告诉 编译器哪些命令在内部或外部循环中。

我的代码现在看起来像这样。您可能需要更改一些内容,因为我现在无法对其进行测试:

import os
from glob import glob
import pandas as pd
import numpy as np



rootDir = 'D:/Data'
outDir = 'D:/OutPut'
os.chdir(rootDir)
dirs = os.listdir(rootDir)

frame = pd.DataFrame()
for dirname in dirs: 
  # the outer loop loops over directories! the actual directory is stored in dirname
  list = [] # collect csv data for every directory, not in general
  files = glob('%s/*.csv' % (dirname))
  for filename in files:
    # the inner loop loops over the files in the 'dirname' folder
    df = pd.read_csv(filename,index_col=None, na_values=['-999'], delim_whitespace= True, header = 0,  skiprows = 2)
    # all csv data should be in 'list' now
    outFile = '%s.csv' % dirname # define the name for output csv
    list.append(df) # do that for every file
    # at this point, all files in the actual directory were processed

frame = pd.concat(list_) # and then merge CSVs
# ...actually not sure how pd.concat works, but i guess it does merge the data
frame.to_csv(os.path.join(outDir, outFile), sep = ',', header= True) # save the data