非常感谢您提供任何帮助。我正在尝试编写一个脚本,该脚本将通过csv文件的文件夹,在第二列中找到最小值并打印包含它的每一行。脚本看起来的csv文件如下所示:
TPN,12010,on this date,25,0.00005047619239909304377497309619
TPN,12011,on this date,23,0.00003797836224092152019127884704
TPN,12012,on this date,78,0.0001130474103447076420049393022
TPN,12020,on this date,27,0.00005671375308512314236202279053
TPN,12021,on this date,60,0.00009856619048244864701475864425
脚本如下所示:
import csv
import os
folder = '/Users/Documents/Senior/Thesis/Python/TextAnalysis/datedmatchedngrams2/'
identity = []
for filename in os.listdir (folder):
with open(filename, 'rb') as inf:
incsv = csv.reader(inf)
column = 1
datatype = int
data = (datatype(row[column]) for row in incsv)
least_value = min(data)
print least_value
for row in incsv:
if least_value in column[1]:
identity.append(row)
else:
print "No match"
print identity
我得到的错误是:
File "findfirsttrigram.py", line 12, in <module>
identity.append("a")
NameError: name 'identity' is not defined
我也试过这样做:
import csv
import os
folder = '/Users/Documents/Senior/Thesis/Python/TextAnalysis/datedmatchedngrams2/'
for filename in os.listdir (folder):
with open(filename, 'rb') as inf:
incsv = csv.reader(inf)
column = 1
datatype = int
data = (datatype(row[column]) for row in incsv)
least_value = min(data)
print least_value
for row in incsv:
if least_value in row:
print row
else:
print "No match"
但那也没有用。它没有给我一个错误,但它也没有打印&#34;没有匹配&#34;所以我不知道从哪里开始。请帮忙!!
答案 0 :(得分:4)
您可以执行以下操作:
import csv
# for each_file in os.listdir (folder):
with open(each_file) as f:
m=min(int(line[1]) for line in csv.reader(f))
f.seek(0)
for line in csv.reader(f):
if int(line[1])==m:
print line
答案 1 :(得分:2)
找不到最小值的原因是,当您查找最小值时,将列转换为int
,但当您将其作为行的一部分查看时,它仍然是一个字符串你看过了。尝试更改您的代码:
for row in incsv:
if int(row[column])==least_value:
print row
else:
print "No match"
关于其他错误,在with
子句中,全局identity
似乎无法访问。您可以使用global
重新引入它,也可以不使用with
子句。
答案 2 :(得分:1)
Ashalynd介绍了价值测试失败的原因。但是,由于你的'#34;不匹配&#34;语句永远不会被调用是因为你的csv阅读器不能两次迭代数据。举一个像这样的简单例子。
with open(filename) as inf:
incsv = csv.reader(inf)
total_lines = 0
for line in incsv:
total_lines += 1
print total_lines
total_lines = 0
for line in incsv:
total_lines += 1
print total_lines
假设有999条记录,它将输出以下内容:
999
0
那是因为在第一次迭代结束时,文件对象的位置在最后。您需要将其重置回文件的开头以重申数据。 inf.seek(0)
和第二个例子应该没问题。很确定这会奏效。
for filename in os.listdir (folder):
with open(filename, 'rb') as inf:
incsv = csv.reader(inf)
column = 1
datatype = int
#This sets the file's current position to the end
data = (datatype(row[column]) for row in incsv)
least_value = min(data)
print least_value
#This resets the file's current position to be read again
inf.seek(0)
for row in incsv:
# Check if the value is the same as properly casted data
if least_value == datatype(row[column]):
print row
else:
print "No match"