Question

我有一个包含关键字的输入文件，并且需要根据这些关键字过滤csv文件。

这是我尝试使用python自动化任务。

import csv
with open('Input.txt', 'rb') as InputFile:
    with open('28JUL2017.csv', 'rb') as CM_File:
        read_Input=csv.reader(InputFile)
        for row1 in csv.reader(InputFile):
            #print row1

            read_CM=csv.reader(CM_File)
            next(read_CM, None)
            for row2 in csv.reader(CM_File):
                #print row2
                if row1[0] == row2[0] :

                    Output= row2[0]+","+row2[1]+","+row2[5]+","+row2[6]
                    print Output

我只从要过滤的文件中获取第一行。尝试了各种各样的事情，却无法理解我哪里出错了。请在这里指出我的错误。

Answer 1

read_Input和read_CM本质上是迭代器。一旦你遍历它们 - 你就完成了：你不能迭代两次。如果你坚持按照自己的方式行事，那么每次你想要开始一个新的循环时，你必须回到文件的开头并重新阅读＆＃34; CSV文件。这是一个修复：

import csv
with open('file1.csv', 'rb') as InputFile:
    with open('file2.csv', 'rb') as CM_File:
        read_Input=csv.reader(InputFile)
        for row1 in csv.reader(InputFile):
            CM_File.seek(0) # rewind to the beginning of the file
            read_CM=csv.reader(CM_File)
            next(read_CM, None)
            for row2 in csv.reader(CM_File):
                if row1[0] == row2[0] :
                    Output= row2[0]+","+row2[1]+","+row2[5]+","+row2[6]
                    print Output

而不是这个，我建议你循环已经读取的行而不是重新读取文件。此外，不是使用嵌套循环，而是创建一个＆＃34;关键字列表＆＃34;只需检查row2[0]是否在该列表中：

import csv
with open('file1.csv', 'rb') as InputFile:
    with open('file2.csv', 'rb') as CM_File:
        read_Input = csv.reader(InputFile) # read file only once
        keywords = [rec[0] for rec in read_Input]
        read_CM = csv.reader(CM_File) # read file only once
        next(read_CM, None) # not sure why you do this? to skip first line?
        for row2 in read_CM:
            if row2[0] in keywords:
                Output = row2[0]+","+row2[1]+","+row2[5]+","+row2[6]
                print("Output: {}".format(Output))

读取两个文件并根据第一个文件列过滤第二个文件

1 个答案: