使用另一个文本文件(文本B)中的列表查找文本文件(文本A)中的名称,并在文本A(Python)

时间:2016-07-04 07:46:29

标签: python find assign names

我是Python语言的新手,我需要你的帮助。

我有2个不同的文本文件。我们是Text_A.txt和Text_B.txt。

Text_A.txt包含以下名称列表(它们是标签描述的):

Sequence_1 Sequence_2 Sequence_3 Sequence_4 Sequence_5 Sequence_6 Sequence_7 Sequence_8

和Text_B.txt包含如下名称列表(序列名称写在每一行):

Sequence_1 Sequence_2 Sequence_3 Sequence_4 Sequence_5 Sequence_6 Sequence_7 Sequence_8 Sequence_9 Sequence_10 Sequence_11

我想做的是分配" 1"如果名称在Text_A.txt中,则在Text_B.txt中的序列名称旁边。并分配" 0"如果名称不在Text_A.txt中,则在Text_B.txt中的序列名称旁边。

所以...使用上面的示例的预期输出如下所示(名称和相应的值应写在每一行):

Sequence_1; 1个
Sequence_2; 1 Sequence_3; 1 Sequence_4; 1 Sequence_5; 1 Sequence_6; 1 Sequence_7; 1 Sequence_8; 1 Sequence_9; 0 Sequence_10; 0 Sequence_11; 0

我想以.txt格式输出。

我应该如何使用Python?

这里确实需要你的帮助,因为我在Text_A.txt和Text_B.txt文件中分别有超过3000和6000个名字。

非常感谢你!

1 个答案:

答案 0 :(得分:0)

您可以执行以下操作

# read each file assuming that your sequence of strings 
# is the first line respectively
with open('Text_A.txt', 'r') as f:
    seqA = f.readline()
with open('Text_B.txt', 'r') as f:
    seqB = f.readline()

# remove end-of-line character
seqA = seqA.strip('\n')
seqB = seqB.strip('\n')

# so far, seqA and seqB are strings. split them now on tabs
seqA = seqA.split('\t')
seqB = seqB.split('\t')

# now, seqA and seqB are list of strings
# since you want to use seqA as a lookup, you should make a set out of seqA
seqA = set( seqA )

# now iterate over each item in seqB and check if it is present in seqA
# store result in a list
out = []
for item in seqB:
    is_present = 1 if item in seqA else 0
    out.append('{item}:{is_presnet}\n'.format(item=item,is_present=is_present))

# write result to file
with open('output.txt','w') as f:
    f.write( '\t'.join( out ) )

如果您的序列包含数百万个条目,您应该考虑采用更高级的方法。