Question

我在python（3.5）中运行CSV文件有一点延迟。以前我使用单个文件并没有问题，但是现在我在一个文件夹中有＆gt; 100个文件。所以，我的目标是：

解析目录中的所有* .csv文件
从每个文件中删除前6行，这些文件包含以下数据：

"nu(Ep), 2.6.8"
"Date: 2/10/16, 11:18:21 AM"
19
Ep,nu
0.0952645,0.123776,
0.119036,0.157720, 
...
0.992060,0.374300,

单独保存每个文件（例如添加＆＃34; _edited＆＃34;），因此应该只保存数字。
作为一种选择 - 我将数据细分为一个材料的两个部分。例如： Ag（0-1_s）.csv 和 Ag（1-4）_s.csv （在步骤1-3之后，应该像 Ag（ *）_ edited.csv ）。如何以（1-4）中的数据添加到（0-1）结尾的方式合并这两个文件，将其保存在第三个文件中？到目前为止，我的代码如下：

    import os, sys
    import csv
    import re
    import glob
    import fileinput


    def get_all_files(directory, extension='.csv'):
        dir_list = os.listdir(directory)
        csv_files = []
        for i in dir_list:
            if i.endswith(extension):
                csv_files.append(os.path.realpath(i))
        return csv_files

    csv_files = get_all_files('/Directory/Path/Here')

    #Here is the problem with csv's, I dont know how to scan files
    #which are in the list "csv_files".

    for n in csv_files:
        #print(n)
        lines = [] #empty, because I dont know how to write it properly per
                   #each file
        input = open(n, 'r')
        reader = csv.reader(n)
        temp = []
        for i in range(5):
            next(reader)
            #a for loop for here regarding rows?
            #for row in n: ???
            #  ???
        input.close()
        #newfilename = "".join(n.split(".csv")) + "edited.csv"
        #newfilename can be used within open() below:
        with open(n + '_edited.csv', 'w') as nf:
            writer = csv.writer(nf)
            writer.writerows(lines)

谢谢！

Answer 1

这是我能想到的最快的方式。如果你有一个固态驱动器，你可以在此处抛出多处理以获得更多的性能提升

import glob
import os

for fpath in glob.glob('path/to/directory/*.csv'):
    fname = os.basename(fpath).rsplit(os.path.extsep, 1)[0]
    with open(fpath) as infile, open(os.path.join('path/to/dir', fname+"_edited"+os.path.extsep+'csv'), 'w') as outfile:
    for _ in range(6): infile.readline()
    for line in infile: outfile.write(line)

在Python中多次编辑CSV文件

1 个答案: