我有一个文件夹,里面装满了csv文件,每个文件都是这样的:
TPN,201203,by the congress,3,0.000001559523908541200542298447130
TPN,201312,by the congress,2,0.000001975728179317089554819047995
TPN,201308,by the congress,2,0.000002130556224313481520620588417
CR,200910,by the congress,10,0.000001254229103759238181242376639
CR,200911,by the congress,5,6.974221464170843876612631794E-7
MED,200507,by the congress,2,0.000004113271264069958517659301854
我想要一个遍历每个文件的脚本,并在该文件中找到最小日期值,然后将该文件中包含该日期值的每一行打印到一个新文件中(因此,如果两行具有相同的日期值,则应该打印两个)。我有这个:
import csv
import os
import codecs
import unicodecsv
folder = '/Users/xyz/Desktop/TextAnalysis/PointsOfOrigin/trigramsdated/'
c = csv.writer(open("trigrampointsoforigin.csv", "a"))
for file in os.listdir (folder):
with open(os.path.join(folder, file), mode='rU') as f:
m=min(int(line[1]) for line in unicodecsv.reader(f, encoding='utf-8', errors='replace'))
f.seek(0)
for line in unicodecsv.reader(f):
if int(line[1])==m:
print line
c.writerow(line)
print "All done."
但是出于一些奇怪的原因,它只是将每个csv中的最后一行打印到" trigramspointsoforign.csv"文件。
非常感谢任何帮助。
答案 0 :(得分:0)
line[1]
是否实际打印出日期值?无论如何,你可以避免一个内部循环:
for file in os.listdir (folder):
with open(os.path.join(folder, file), mode='rU') as f:
minline = [-1,-1] # Some min value
for linenum, line in enumerate(unicodecsv.reader(f)):
if int(line[1]) < int(minline[1]):
# Replace with new minline
minline = copy.copy(line)
print minline
c.writerow(minline)
您需要import copy
。