从两个文件中的项目获取矩阵

时间:2014-11-21 14:38:33

标签: python

我有两个文件,我想从中获取下面的存在(1)和缺席(0)矩阵。 如果fileB中的任何项目(或col1,不知道哪个输入最好)与cols2-4中的项目匹配,则记录得分为“1”,记录其他“0”

文件A:

 col1   col2    col3    col4
 esd    dus esd muq
 uum    uum dus esd
 dus    esd uum dus
 muq    muq muq uum

文件B:

esd
uum
dus 
muq

我的尝试:

out_file=open("out.txt", "w")
for itemA in open("fileA", "r") as file1:
    file2=open("fileB", "r")
    for row in file2:
        for col in file2:
            if itemA==file2[row][col]:
                out_file.write(int(1))
            else:
                out_file.write(int(0))

预期产出:

    col1    col2    col3
 esd    0   1   0
 uum    1   0   0
 dus    0   0   1
 muq    1   1   0  

将非常感谢帮助python代码。

3 个答案:

答案 0 :(得分:1)

这样的事情对你有用吗?

with open('a.txt') as fh:
    for line in fh:
        cols = line.split()
        key = cols[0]
        print key,
        for col in cols[1:]:
            # Print 1 if they are the same, 0 otherwise
            print int(col == key),

        # Newline
        print

使用a.txt

 esd    dus esd muq
 uum    uum dus esd
 dus    esd uum dus
 muq    muq muq uum

输出:

esd 0 1 0
uum 1 0 0
dus 0 0 1
muq 1 1 0

答案 1 :(得分:1)

如果文件A的每一行中的第一项是您要查找的内容,则不需要文件B.

result = []
for line in open('input.txt').readlines():
    tokens = line.split()
    seek = tokens[0]  # We seek occurrences of the first token in the row.
    row = [seek]      # This array stores pieces of output.
    for item in tokens[1:]:
        if item == seek:
            row.append('1')   # Note that these are strings, not integers.
        else:                 # You might like to replace them with other
            row.append('0')   #   values such as 'Y'/'N' or 'T'/'F'.
    result.append(row)
lines = ['  '.join(row) for row in result]  # Making lines of output.
text = '\n'.join(lines)                     # Gluing the lines together.
print(text)                                 # Printing for verification.
with open('output.txt', 'w') as out_file:   # Then writing to file.
    out_file.write(text+'\n')

以上代码将采用此输入:

esd    dus esd muq
uum    uum dus esd
dus    esd uum dus
muq    muq muq uum

并产生此输出:

esd  0  1  0
uum  1  0  0
dus  0  0  1
muq  1  1  0

答案 2 :(得分:0)

如果B中的列不一定与A中的第一列匹配,那么您可以将任何文件上的next方法调用为同步读取形式:

fileA = 'fileA.tsv'
fileB = 'fileB.tsv'
outfilename = 'outfile.tsv'

with open(fileA) as fa:
    with open(fileB) as fb:
        with open(outfilename, 'w') as outfile:
            for line in fb:
                corresp_a_line = fa.next()
                fields = corresp_a_line.split()
                outfile.write(fields[0])  # write column 1
                for field in fields[1:]:
                    outfile.write("\t{}".format(int(line.strip() in field)))
                outfile.write("\n")