如何仅使用大小超过2KB的文件将600个csv文件连接为一个文件

时间:2018-11-05 09:40:25

标签: python csv

我设法将所有600个csv文件连接为一个,但是我只希望文件大于2KB,因为它们中没有数据。

有没有一种方法可以添加到此代码中?

我用来创建单个csv文件的代码是:

from pathlib import Path
import csv

indir = Path(r'C:\\Users\gerardchurch\Documents\Data\dev_en')
outfile = Path(r"C:\\Users\gerardchurch\Documents\Data\output.csv")


def find_header_from_all_files(indir):
    columns = set()
    print("Looking for column names in", indir)
    for f in indir.glob('*.csv'):
        with f.open() as sample_csv:
            sample_reader = csv.DictReader(sample_csv)
            try:
                first_row = next(sample_reader)
            except StopIteration:
                print("File {} doesn't contain any data. Double check 
this".format(f))
                continue
            else:
                columns.update(first_row.keys())
    return columns


columns = find_header_from_all_files(indir)
print("The columns are:", sorted(columns))

with outfile.open('w') as outf:
    wr = csv.DictWriter(outf, fieldnames=list(columns))
    wr.writeheader()
    for inpath in indir.glob('*.csv'):
        print("Parsing", inpath)
        with inpath.open() as infile:
            reader = csv.DictReader(infile)
            wr.writerows(reader)
print("Done, find the output at", outfile)

谢谢。

1 个答案:

答案 0 :(得分:0)

使用os模块。您可以像这样创建支票

if os.stat(fname).st_size > dsize:
   func(fname)

其中fname是文件路径dsize是文件大小的下限阈值,而func(fname)是文件通过检查后要处理的文件

docs非常简单