如何正确循环两个文件,比较两个文件中的字符串

时间:2014-02-12 23:08:14

标签: python csv

我正在尝试比较两个csv文件并期望此输出但无法成功。这是我的示例和代码:

File1.csv

meNOG00110,9606.ENSP00000349259,1,2364

meNOG06332,9606.ENSP00000344967,1,322

meNOG06773,9606.ENSP00000344961,1,379

meNOG03133,9606.ENSP00000387429,1,2089

meNOG17468,9606.ENSP00000217169,1,298

File2.csv

meNOG06332,9606.ENSP00000344967,1,322

meNOG00110,9606.ENSP00000349259,1,2364

meNOG00110,9606.ENSP00000357130,1,2419

meNOG00018,10090.ENSMUSP00000027367,1,261

meNOG00018,10090.ENSMUSP00000072852,1,276

output.txt的

meNOG06332  9606.ENSP00000344967    1   322

meNOG00110  9606.ENSP00000349259    1   2364

meNOG00018  10090.ENSMUSP00000027367    1   261

meNOG00018  10090.ENSMUSP00000072852    1   276

代码:

file1 = open("File1.csv", "rU")
reader1 = csv.reader(file1,delimiter=',')

file2 = open("File2.csv", "rU")
reader2 = csv.reader(file2,delimiter=',')

for row2 in reader2:
    for row1 in reader1:
        if row2[1].startswith('9606'):
            if row2[1] == row1[1]:
                print row2              
        else:
            print row2

但是这段代码只搜索第一行。

3 个答案:

答案 0 :(得分:0)

我不确定这正是你想要的,但因为那不清楚:

如果您要查找两个文件之间的重叠并且想要比较整行,则可以创建两个集合(每个文件一个)并输出交集:

with open('File1.csv', 'r') as infile1, 
     open('File2.csv', 'r') as infile2,
     open('File3.csv', 'w') as outfile:
    lines1 = set(infile1)
    lines2 = set(infile2)

    writer = csv.writer(outfile, delimiter=',')
    for line in (lines1 & lines2):
        writer.writerow(line)

答案 1 :(得分:0)

我不确定您期望的结果格式,但是为了比较两个文件,您可以使用标准的python模块:

http://docs.python.org/2/library/difflib.html

您可以根据需要分析输出和格式

答案 2 :(得分:0)

您可以将两个文件压缩在一起:

with open(path_a, 'r') as a, open(path_b, 'r') as b:
    for line_a, line_b in zip(a, b):
        print line_a, line_b

如果第一个文件是:

a
s
d
f

,第二个文件是:

q
w
e
r
输出将是:

a q
s w
d e
f r