读取两个文件并根据第一个文件列过滤第二个文件

时间:2017-07-31 18:17:58

标签: python loops csv for-loop

我有一个包含关键字的输入文件,并且需要根据这些关键字过滤csv文件。

这是我尝试使用python自动化任务。

import csv
with open('Input.txt', 'rb') as InputFile:
    with open('28JUL2017.csv', 'rb') as CM_File:
        read_Input=csv.reader(InputFile)
        for row1 in csv.reader(InputFile):
            #print row1

            read_CM=csv.reader(CM_File)
            next(read_CM, None)
            for row2 in csv.reader(CM_File):
                #print row2
                if row1[0] == row2[0] :

                    Output= row2[0]+","+row2[1]+","+row2[5]+","+row2[6]
                    print Output

我只从要过滤的文件中获取第一行。尝试了各种各样的事情,却无法理解我哪里出错了。请在这里指出我的错误。

1 个答案:

答案 0 :(得分:1)

read_Inputread_CM本质上是迭代器。一旦你遍历它们 - 你就完成了:你不能迭代两次。如果你坚持按照自己的方式行事,那么每次你想要开始一个新的循环时,你必须回到文件的开头并重新阅读" CSV文件。这是一个修复:

import csv
with open('file1.csv', 'rb') as InputFile:
    with open('file2.csv', 'rb') as CM_File:
        read_Input=csv.reader(InputFile)
        for row1 in csv.reader(InputFile):
            CM_File.seek(0) # rewind to the beginning of the file
            read_CM=csv.reader(CM_File)
            next(read_CM, None)
            for row2 in csv.reader(CM_File):
                if row1[0] == row2[0] :
                    Output= row2[0]+","+row2[1]+","+row2[5]+","+row2[6]
                    print Output

而不是这个,我建议你循环已经读取的行而不是重新读取文件。此外,不是使用嵌套循环,而是创建一个"关键字列表"只需检查row2[0]是否在该列表中:

import csv
with open('file1.csv', 'rb') as InputFile:
    with open('file2.csv', 'rb') as CM_File:
        read_Input = csv.reader(InputFile) # read file only once
        keywords = [rec[0] for rec in read_Input]
        read_CM = csv.reader(CM_File) # read file only once
        next(read_CM, None) # not sure why you do this? to skip first line?
        for row2 in read_CM:
            if row2[0] in keywords:
                Output = row2[0]+","+row2[1]+","+row2[5]+","+row2[6]
                print("Output: {}".format(Output))