我有两个文件,我想从中获取下面的存在(1)和缺席(0)矩阵。 如果fileB中的任何项目(或col1,不知道哪个输入最好)与cols2-4中的项目匹配,则记录得分为“1”,记录其他“0”
文件A:
col1 col2 col3 col4
esd dus esd muq
uum uum dus esd
dus esd uum dus
muq muq muq uum
文件B:
esd
uum
dus
muq
我的尝试:
out_file=open("out.txt", "w")
for itemA in open("fileA", "r") as file1:
file2=open("fileB", "r")
for row in file2:
for col in file2:
if itemA==file2[row][col]:
out_file.write(int(1))
else:
out_file.write(int(0))
预期产出:
col1 col2 col3
esd 0 1 0
uum 1 0 0
dus 0 0 1
muq 1 1 0
将非常感谢帮助python代码。
答案 0 :(得分:1)
这样的事情对你有用吗?
with open('a.txt') as fh:
for line in fh:
cols = line.split()
key = cols[0]
print key,
for col in cols[1:]:
# Print 1 if they are the same, 0 otherwise
print int(col == key),
# Newline
print
使用a.txt
:
esd dus esd muq
uum uum dus esd
dus esd uum dus
muq muq muq uum
输出:
esd 0 1 0
uum 1 0 0
dus 0 0 1
muq 1 1 0
答案 1 :(得分:1)
如果文件A的每一行中的第一项是您要查找的内容,则不需要文件B.
result = []
for line in open('input.txt').readlines():
tokens = line.split()
seek = tokens[0] # We seek occurrences of the first token in the row.
row = [seek] # This array stores pieces of output.
for item in tokens[1:]:
if item == seek:
row.append('1') # Note that these are strings, not integers.
else: # You might like to replace them with other
row.append('0') # values such as 'Y'/'N' or 'T'/'F'.
result.append(row)
lines = [' '.join(row) for row in result] # Making lines of output.
text = '\n'.join(lines) # Gluing the lines together.
print(text) # Printing for verification.
with open('output.txt', 'w') as out_file: # Then writing to file.
out_file.write(text+'\n')
以上代码将采用此输入:
esd dus esd muq
uum uum dus esd
dus esd uum dus
muq muq muq uum
并产生此输出:
esd 0 1 0
uum 1 0 0
dus 0 0 1
muq 1 1 0
答案 2 :(得分:0)
如果B中的列不一定与A中的第一列匹配,那么您可以将任何文件上的next
方法调用为同步读取形式:
fileA = 'fileA.tsv'
fileB = 'fileB.tsv'
outfilename = 'outfile.tsv'
with open(fileA) as fa:
with open(fileB) as fb:
with open(outfilename, 'w') as outfile:
for line in fb:
corresp_a_line = fa.next()
fields = corresp_a_line.split()
outfile.write(fields[0]) # write column 1
for field in fields[1:]:
outfile.write("\t{}".format(int(line.strip() in field)))
outfile.write("\n")