我设法将所有600个csv文件连接为一个,但是我只希望文件大于2KB,因为它们中没有数据。
有没有一种方法可以添加到此代码中?
我用来创建单个csv文件的代码是:
from pathlib import Path
import csv
indir = Path(r'C:\\Users\gerardchurch\Documents\Data\dev_en')
outfile = Path(r"C:\\Users\gerardchurch\Documents\Data\output.csv")
def find_header_from_all_files(indir):
columns = set()
print("Looking for column names in", indir)
for f in indir.glob('*.csv'):
with f.open() as sample_csv:
sample_reader = csv.DictReader(sample_csv)
try:
first_row = next(sample_reader)
except StopIteration:
print("File {} doesn't contain any data. Double check
this".format(f))
continue
else:
columns.update(first_row.keys())
return columns
columns = find_header_from_all_files(indir)
print("The columns are:", sorted(columns))
with outfile.open('w') as outf:
wr = csv.DictWriter(outf, fieldnames=list(columns))
wr.writeheader()
for inpath in indir.glob('*.csv'):
print("Parsing", inpath)
with inpath.open() as infile:
reader = csv.DictReader(infile)
wr.writerows(reader)
print("Done, find the output at", outfile)
谢谢。
答案 0 :(得分:0)
使用os
模块。您可以像这样创建支票
if os.stat(fname).st_size > dsize:
func(fname)
其中fname
是文件路径dsize
是文件大小的下限阈值,而func(fname)
是文件通过检查后要处理的文件
docs非常简单