我正在编写一个从不同目录中读取文件的脚本;然后我使用文件ID在csv
文件中搜索。这是一段代码。
import os
import glob
searchfile = open("file.csv", "r")
train_file = open('train.csv','w')
listOfFiles = os.listdir("train")
for l in listOfFiles:
dirList = glob.glob(('/train/%s/*.jpg') % (l))
for d in dirList:
id = d.split("/")
id = id[-1].split(".")
print id[0] # ID
for line in searchfile:
if id[0] in line: # search in csv file
value= line.split(",")
value= value[1]+" "+ value[2] + "\n"
train_file.write(id[0]+","+value) # write description
break
searchfile.close()
train_file.close()
但是,我只能从csv
文件中搜索一对ID。有人可以指出我的错误。 (请参阅说明评论)
EDITED
文本文件的实例。
192397335,carrello porta utensili 18x27 eh l 411 x p 572 x h 872 6 cassetti,,691.74,192397335.jpg
答案 0 :(得分:1)
您的问题是当您执行for line in searchfile:
时,您正在循环生成器。该文件不会针对每个id
进行重置 - 例如,如果您传递给它的第一个id
位于第50行,则下一个id
将在第51行开始检查。
相反,您可以将文件读取到列表中,然后循环遍历列表:
import os
import glob
with open("file.csv", "r") as s:
search_file = s.readlines()
train_file = open('train.csv', 'w')
list_of_files = os.listdir("train")
for l in list_of_files:
dirList = glob.glob(('/train/%s/*.jpg') % (l))
for d in dirList:
fname = os.path.splitext(os.path.basename(d))
print fname[0] # ID
for line in search_file:
if fname[0] in line: # search in csv file
value = line.split(",")
value = value[1]+" " + value[2] + "\n"
train_file.write(fname[0]+","+value) # write description
break
train_file.close()
我做了其他一些改动 - 首先,你不应该使用名称id
,因为它在Python中有意义 - 我选择了fname
来表示文件名。其次,我将CamelCase名称伪装成小写,就像惯例一样。最后,通过os.path.splitext
和os.path.basename
的组合,获取文件名和扩展名是完整且相当一致的。
答案 1 :(得分:1)
你需要浏览找到的每个id的搜索文件行,但是当你在循环之外打开文件时,你只能在整个循环中读取每一行。
您应该将整个文件加载到列表中并迭代循环内的行列表,或者如果searchfile非常大并且几乎不适合内存重新打开循环内的文件:
列表版本:
with open("file.csv", "r") as searchfile:
searchlines = searchfile.readlines()
train_file = open('train.csv','w')
listOfFiles = os.listdir("train")
for l in listOfFiles:
dirList = glob.glob(('/train/%s/*.jpg') % (l))
for d in dirList:
id = d.split("/")
id = id[-1].split(".")
print id[0] # ID
for line in searchlines: # now a list so start at the beginning on each pass
if id[0] in line: # search in csv file
value= line.split(",")
value= value[1]+" "+ value[2] + "\n"
train_file.write(id[0]+","+value) # write description
break
train_file.close()
重新打开版本
train_file = open('train.csv','w')
listOfFiles = os.listdir("train")
for l in listOfFiles:
dirList = glob.glob(('/train/%s/*.jpg') % (l))
for d in dirList:
id = d.split("/")
id = id[-1].split(".")
print id[0] # ID
searchfile = open("file.csv", "r")
for line in searchfile:
if id[0] in line: # search in csv file
value= line.split(",")
value= value[1]+" "+ value[2] + "\n"
train_file.write(id[0]+","+value) # write description
break
searchfile.close()
train_file.close()