我是Python语言的新手,我需要你的帮助。
我有2个不同的文本文件。我们是Text_A.txt和Text_B.txt。
Text_A.txt包含以下名称列表(它们是标签描述的):
Sequence_1 Sequence_2 Sequence_3 Sequence_4 Sequence_5 Sequence_6 Sequence_7 Sequence_8
和Text_B.txt包含如下名称列表(序列名称写在每一行):
Sequence_1 Sequence_2 Sequence_3 Sequence_4 Sequence_5 Sequence_6 Sequence_7 Sequence_8 Sequence_9 Sequence_10 Sequence_11
我想做的是分配" 1"如果名称在Text_A.txt中,则在Text_B.txt中的序列名称旁边。并分配" 0"如果名称不在Text_A.txt中,则在Text_B.txt中的序列名称旁边。
所以...使用上面的示例的预期输出如下所示(名称和相应的值应写在每一行):
Sequence_1; 1个
Sequence_2; 1
Sequence_3; 1
Sequence_4; 1
Sequence_5; 1
Sequence_6; 1
Sequence_7; 1
Sequence_8; 1
Sequence_9; 0
Sequence_10; 0
Sequence_11; 0
我想以.txt格式输出。
我应该如何使用Python?
这里确实需要你的帮助,因为我在Text_A.txt和Text_B.txt文件中分别有超过3000和6000个名字。
非常感谢你!
答案 0 :(得分:0)
您可以执行以下操作
# read each file assuming that your sequence of strings
# is the first line respectively
with open('Text_A.txt', 'r') as f:
seqA = f.readline()
with open('Text_B.txt', 'r') as f:
seqB = f.readline()
# remove end-of-line character
seqA = seqA.strip('\n')
seqB = seqB.strip('\n')
# so far, seqA and seqB are strings. split them now on tabs
seqA = seqA.split('\t')
seqB = seqB.split('\t')
# now, seqA and seqB are list of strings
# since you want to use seqA as a lookup, you should make a set out of seqA
seqA = set( seqA )
# now iterate over each item in seqB and check if it is present in seqA
# store result in a list
out = []
for item in seqB:
is_present = 1 if item in seqA else 0
out.append('{item}:{is_presnet}\n'.format(item=item,is_present=is_present))
# write result to file
with open('output.txt','w') as f:
f.write( '\t'.join( out ) )
如果您的序列包含数百万个条目,您应该考虑采用更高级的方法。