我正在尝试编写一些代码来对非常大的excel文件进行下采样。它需要准确复制前4行,然后在第5行开始每40行。我目前有这个
import os
import string
import shutil
import datetime
folders = os.listdir('./')
names = [s for s in folders if "csv" in s]
zips = [s for s in folders if "zip" in s]
for folder in names:
filename = folder
shutil.move(folder, './Archive')
with open(filename) as f:
counter = 0
for line in f:
counter += 1
f_out = open('./DownSampled/' + folder + '.csv', 'w')
if counter < 5:
f_out.write(line)
elif (counter+35) % 40 == 0:
f_out.write(line)
f_out.close()
它将文件移动到Archive文件夹,但是没有创建一个缩减版本的版本,我在这里做错了什么想法?
答案 0 :(得分:1)
您在前一个文件的每次迭代中都会覆盖该文件。将open(...)
移出for
循环:
with open(filename) as f, open('./DownSampled/' + folder + '.csv', 'w') as f_out:
for i, line in enumerate(f):
if i < 5:
f_out.write(line)
elif (i+35) % 40 == 0:
f_out.write(line)
更重要的是,custom history tables for logging in SQL Server可以取代您的计数逻辑。