在一个CSV中查找所有行的最小值并在Python中将所有行打印到新文件

时间:2015-02-16 00:17:30

标签: python csv min

我有一个文件夹,里面装满了csv文件,每个文件都是这样的:

TPN,201203,by the congress,3,0.000001559523908541200542298447130
TPN,201312,by the congress,2,0.000001975728179317089554819047995
TPN,201308,by the congress,2,0.000002130556224313481520620588417
CR,200910,by the congress,10,0.000001254229103759238181242376639
CR,200911,by the congress,5,6.974221464170843876612631794E-7
MED,200507,by the congress,2,0.000004113271264069958517659301854

我想要一个遍历每个文件的脚本,并在该文件中找到最小日期值,然后将该文件中包含该日期值的每一行打印到一个新文件中(因此,如果两行具有相同的日期值,则应该打印两个)。我有这个:

import csv
import os
import codecs 
import unicodecsv

folder = '/Users/xyz/Desktop/TextAnalysis/PointsOfOrigin/trigramsdated/'

c = csv.writer(open("trigrampointsoforigin.csv", "a"))

for file in os.listdir (folder):
    with open(os.path.join(folder, file), mode='rU') as f:
        m=min(int(line[1]) for line in unicodecsv.reader(f, encoding='utf-8', errors='replace'))
        f.seek(0)
        for line in unicodecsv.reader(f):
            if int(line[1])==m:
                print line
                c.writerow(line)

print "All done."

但是出于一些奇怪的原因,它只是将每个csv中的最后一行打印到" trigramspointsoforign.csv"文件。

非常感谢任何帮助。

1 个答案:

答案 0 :(得分:0)

line[1]是否实际打印出日期值?无论如何,你可以避免一个内部循环:

for file in os.listdir (folder):
    with open(os.path.join(folder, file), mode='rU') as f:
        minline = [-1,-1] # Some min value
        for linenum, line in enumerate(unicodecsv.reader(f)):
            if int(line[1]) < int(minline[1]):
                # Replace with new minline
                minline = copy.copy(line) 
        print minline
        c.writerow(minline)

您需要import copy