在csv文件中搜索

时间:2017-04-06 08:39:01

标签: python csv

我正在编写一个从不同目录中读取文件的脚本;然后我使用文件ID在csv文件中搜索。这是一段代码。

import os
import glob

searchfile = open("file.csv", "r")
train_file = open('train.csv','w')



listOfFiles = os.listdir("train")
for l in listOfFiles:
    dirList = glob.glob(('/train/%s/*.jpg') % (l))
    for d in dirList:
        id = d.split("/")
        id = id[-1].split(".")
        print id[0] # ID
        for line in searchfile:
            if id[0] in line: # search in csv file
                value= line.split(",") 
                value= value[1]+" "+ value[2] + "\n"
                train_file.write(id[0]+","+value) # write description
                break
searchfile.close()
train_file.close()

但是,我只能从csv文件中搜索一对ID。有人可以指出我的错误。 (请参阅说明评论)

EDITED

文本文件的实例。

192397335,carrello porta utensili 18x27 eh l 411 x p 572 x h 872 6 cassetti,,691.74,192397335.jpg

2 个答案:

答案 0 :(得分:1)

您的问题是当您执行for line in searchfile:时,您正在循环生成器。该文件不会针对每个id进行重置 - 例如,如果您传递给它的第一个id位于第50行,则下一个id将在第51行开始检查。

相反,您可以将文件读取到列表中,然后循环遍历列表:

import os
import glob

with open("file.csv", "r") as s:
    search_file = s.readlines()

train_file = open('train.csv', 'w')

list_of_files = os.listdir("train")
for l in list_of_files:
    dirList = glob.glob(('/train/%s/*.jpg') % (l))
    for d in dirList:
        fname = os.path.splitext(os.path.basename(d))
        print fname[0] # ID
        for line in search_file:
            if fname[0] in line: # search in csv file
                value = line.split(",") 
                value = value[1]+" " + value[2] + "\n"
                train_file.write(fname[0]+","+value) # write description
                break

train_file.close()

我做了其他一些改动 - 首先,你不应该使用名称id,因为它在Python中有意义 - 我选择了fname来表示文件名。其次,我将CamelCase名称伪装成小写,就像惯例一样。最后,通过os.path.splitextos.path.basename的组合,获取文件名和扩展名是完整且相当一致的。

答案 1 :(得分:1)

你需要浏览找到的每个id的搜索文件行,但是当你在循环之外打开文件时,你只能在整个循环中读取每一行。

您应该将整个文件加载到列表中并迭代循环内的行列表,或者如果searchfile非常大并且几乎不适合内存重新打开循环内的文件:

列表版本:

with open("file.csv", "r") as searchfile:
    searchlines = searchfile.readlines()

train_file = open('train.csv','w')

listOfFiles = os.listdir("train")
for l in listOfFiles:
    dirList = glob.glob(('/train/%s/*.jpg') % (l))
    for d in dirList:
        id = d.split("/")
        id = id[-1].split(".")
        print id[0] # ID
        for line in searchlines:   # now a list so start at the beginning on each pass
            if id[0] in line: # search in csv file
                value= line.split(",") 
                value= value[1]+" "+ value[2] + "\n"
                train_file.write(id[0]+","+value) # write description
                break
train_file.close()

重新打开版本

train_file = open('train.csv','w')

listOfFiles = os.listdir("train")
for l in listOfFiles:
    dirList = glob.glob(('/train/%s/*.jpg') % (l))
    for d in dirList:
        id = d.split("/")
        id = id[-1].split(".")
        print id[0] # ID
        searchfile = open("file.csv", "r")
        for line in searchfile:
            if id[0] in line: # search in csv file
                value= line.split(",") 
                value= value[1]+" "+ value[2] + "\n"
                train_file.write(id[0]+","+value) # write description
                break
        searchfile.close()
train_file.close()