Question

我撞墙了。到目前为止，具有以下代码：

# define variables of each directory to be used
parent_data_dir = 'C:\\Users\\Admin\\Documents\\Python Scripts\\Data\\'
orig_data_dir = 'C:\\Users\\Admin\\Documents\\Python Scripts\\Data\\Original\\'
new_data_dir = 'C:\\Users\\Admin\\Documents\\Python Scripts\\Data\\New\\'

# Create list of original data files from orig_data_dir
orig_data = []
for root, dirs, files in os.walk(orig_data_dir):
    for file in files:
        if file.endswith('.csv'):
            orig_data.append(file)
# It populates the file names located in the orig_data_dir
# orig_data = ['Test1.csv', 'Test2.csv', 'Test3.csv'] 

# Create list of new data files from new_data_dir
new_data = []
for root, dirs, files in os.walk(new_data_dir):
    for file in files:
        if file.endswith('.csv'):
            new_data.append(file)
# It populates the file names located in the orig_data_dir
# orig_data = ['Test1_2.csv', 'Test2_2.csv', 'Test3_2.csv']

我在每个目录中都有三个csv文件。以_2.csv结尾的csv文件具有新数据，我想将每对数据对的旧数据附加到新的csv文件中。每个csv文件具有完全相同的行。我想做的是以下事情：

使用我创建的列表将Test1.csv和Test1_2.csv读入一个数据帧（如果有更好的方法，我对此很开放）（下一次迭代= Test2.csv和Test2_2.csv等）
做一些熊猫的事情
写一个名为Test_Compiled_1.csv的新文件（下一次迭代= Test_Compiled_2.csv等）
重复执行，直到将两个目录中的每个csv对合并为每个对的新csv文件为止。

编辑：我有1000个csv文件。话虽如此，我需要：

在第一个文件对中读取相同的数据帧：第一次迭代：位于orig_data_dir中的Test1.csv和位于new_data_dir中的Test1_2.csv
做熊猫的事
将填充的数据框写到parent_data_dir中的新文件
为每个文件对重复

第二次迭代将是：Test2.csv和Test2_2.csv

1000次迭代将是：Test1000.csv和Test1000_2.csv

希望这有助于澄清。

Answer 1

类似的东西可以帮助您：

endswith

注意； csv中的“ sep”可能有所不同。

编辑；我已将fnmatch.filter更改为{{1}}，现在您可以使用所需的任何模式来匹配不同目录中所需的文件。

Answer 2

最好的建议是让每个目录中的文件使用相同的名称，并且仅将有用的数据放在这些目录中。这是使用不同名称的解决方案：

for filename in os.listdir(orig_data_dir):
    name,ext = os.path.splitext(filename)
    filename_2 = new_data_dir+name+'_2'+ext # construct new filename from old
    if os.path.isfile(filename_2):
        df_Orig=pd.read_csv(orig_data_dir+filename,index_col=0)
        df_New=pd.read_csv(filename_2,index_col=0)
        df_Orig.append(df_New).to_csv(orig_data_dir+filename)

在这里，我将结果累积在原始文件中。只需一个循环。

反复从不同目录读取多个cvs到数据帧，然后写入新的csv

2 个答案: