使用python比较两个文件来获取矩阵

时间:2014-11-22 01:18:27

标签: python

我想比较两个文件的内容,然后得到一个矩阵,其中匹配得分为“1”,没有匹配得分为“0”。例如,file1.txt中的aer23用于搜索file2.txt中的所有元素,并且列匹配记录匹配/不匹配。因此在输出中,file1.txt的内容成为行,file2.txt的内容成为列

FILE1.TXT:

aer23
aub1
fer4
qty1
sap89
xty32

FILE2.TXT:

fer4
xty32
aer23
aub1
sap89
qty1

输出:

       fer4 xty32   aer23   aub1    sap89   qty1    
aer23   0   0   1   0   0   0   
aub1    0   0   0   1   0   0
fer4    1   0   0   0   0   0
qty1    0   0   0   0   0   1
sap89   0   0   0   0   1   0
xty32   0   1   0   0   0   0

我的代码:

outfile=open("out.txt","w")

record=[]
for line in open("file2.txt","r"):
    record.append(line)
    for line in open("file2.txt","r"):
        if line==iter(record):
            outfile.write("1","\t")
        else:
        outfile.write("0","\t")
        next

如何使此代码生效?感谢

1 个答案:

答案 0 :(得分:1)

你想要做的是:

outfile=open("out.txt","w")

# First you need to write the header row
outfile.write("\t")
for line2 in open("file2.txt","r"):
    outfile.write(line2.strip() + "\t")
outfile.write("\n")

# You never do anything useful with record, so don't build it
#record=[]

# Open file1 and file2, not file2 and file2, and don't reuse the name line
for line1 in open("file1.txt","r"):
    # You need also need to write the header column
    outfile.write(line1.strip() + "\t")
    #record.append(line)
    for line2 in open("file2.txt","r"):
        # Don't try to compare the string to a list iterator, compare it
        # to the string from the other file.
        if line1==line2:
            # You can't pass write multiple arguments like print, just
            # put the two strings together
            outfile.write("1\t")
        else:
            # Indentation matters in Python
            outfile.write("0\t")
        # next is a function that gets the next value from an iterator;
        # just referring to that function by name doesn't do anything
        #next
    # Don't forget to end each line
    outfile.write("\n")

# You should always close files, but _especially_ writable files
outfile.close()

这可以改进很多,但这应该是最简单的一组变化,让你接近你想要的地方。

不是向您展示您可以逐一进行的所有更改,而是让我告诉您我是如何编写的,您可以在帮助中查找所有功能:

import csv
with open('file2.txt') as file2:
    columns = [line.strip() for line in file2]
with open('file1.txt') as file1, open('out.txt', 'w') as outfile:
    writer = csv.writer(outfile, delimiter='\t')
    writer.writerow([''] + columns)
    for line in file1:
        row = line.strip()
        writer.writerow([row] + [1 if row==column else 0 for column in columns])