在CSV中查找最小值并在Python中打印包含它的每一行

时间:2014-12-09 01:20:59

标签: python python-2.7 csv

非常感谢您提供任何帮助。我正在尝试编写一个脚本,该脚本将通过csv文件的文件夹,在第二列中找到最小值并打印包含它的每一行。脚本看起来的csv文件如下所示:

TPN,12010,on this date,25,0.00005047619239909304377497309619
TPN,12011,on this date,23,0.00003797836224092152019127884704
TPN,12012,on this date,78,0.0001130474103447076420049393022
TPN,12020,on this date,27,0.00005671375308512314236202279053
TPN,12021,on this date,60,0.00009856619048244864701475864425

脚本如下所示:

import csv
import os

folder = '/Users/Documents/Senior/Thesis/Python/TextAnalysis/datedmatchedngrams2/'

identity = []
for filename in os.listdir (folder):
    with open(filename, 'rb') as inf:
        incsv = csv.reader(inf)
        column = 1               
        datatype = int
        data = (datatype(row[column]) for row in incsv)   
        least_value = min(data)
        print least_value
        for row in incsv:
            if least_value in column[1]:
                identity.append(row)
            else:
                print "No match"
        print identity

我得到的错误是:

  File "findfirsttrigram.py", line 12, in <module>
    identity.append("a")
NameError: name 'identity' is not defined

我也试过这样做:

import csv
import os

folder = '/Users/Documents/Senior/Thesis/Python/TextAnalysis/datedmatchedngrams2/'

for filename in os.listdir (folder):
    with open(filename, 'rb') as inf:
        incsv = csv.reader(inf)
        column = 1               
        datatype = int
        data = (datatype(row[column]) for row in incsv)   
        least_value = min(data)
        print least_value
        for row in incsv:
            if least_value in row:
                print row
            else:
                print "No match"

但那也没有用。它没有给我一个错误,但它也没有打印&#34;没有匹配&#34;所以我不知道从哪里开始。请帮忙!!

3 个答案:

答案 0 :(得分:4)

您可以执行以下操作:

import csv

# for each_file in os.listdir (folder):    
with open(each_file) as f:
    m=min(int(line[1]) for line in csv.reader(f))
    f.seek(0)
    for line in csv.reader(f):
        if int(line[1])==m:
            print line

答案 1 :(得分:2)

找不到最小值的原因是,当您查找最小值时,将列转换为int,但当您将其作为行的一部分查看时,它仍然是一个字符串你看过了。尝试更改您的代码:

for row in incsv:
    if int(row[column])==least_value:
        print row
    else:
        print "No match"

关于其他错误,在with子句中,全局identity似乎无法访问。您可以使用global重新引入它,也可以不使用with子句。

答案 2 :(得分:1)

Ashalynd介绍了价值测试失败的原因。但是,由于你的'#34;不匹配&#34;语句永远不会被调用是因为你的csv阅读器不能两次迭代数据。举一个像这样的简单例子。

with open(filename) as inf:
    incsv = csv.reader(inf)
    total_lines = 0
    for line in incsv:
        total_lines += 1
    print total_lines

    total_lines = 0
    for line in incsv:
        total_lines += 1
    print total_lines

假设有999条记录,它将输出以下内容:

999
0

那是因为在第一次迭代结束时,文件对象的位置在最后。您需要将其重置回文件的开头以重申数据。 inf.seek(0)和第二个例子应该没​​问题。很确定这会奏效。

for filename in os.listdir (folder):
    with open(filename, 'rb') as inf:
        incsv = csv.reader(inf)
        column = 1               
        datatype = int
        #This sets the file's current position to the end
        data = (datatype(row[column]) for row in incsv)   
        least_value = min(data)
        print least_value
        #This resets the file's current position to be read again
        inf.seek(0)
        for row in incsv:
            # Check if the value is the same as properly casted data
            if least_value == datatype(row[column]):
                print row
            else:
                print "No match"