我有一个如下文件。
序列A.1.1细菌
ATGCGCGATATAGGCCT
ATTATGCGCGCGCGC序列A.1.2病毒
ATATATGCGCCGCGCGTA
ATATATATGCGCGCCGGC序列B.1.21黑猩猩
ATATAGCGCGCGCGCGAT
ATATATATGCGCG序列C.21.4人类 ATATATATGCCGCGCG
ATATAATATC
我想从一个文件为A,B和C类序列创建单独的文件。请提供一些破解此代码的阅读材料。谢谢。输出应该是三个文件,一个用于'A',第二个文件用于带有'B'的序列,第三个文件用于带有'C'的序列。
答案 0 :(得分:2)
不是100%清楚你想做什么,而是像:
currout = None
seqname2file = dict()
for line in open('thefilewhosenameyoudonottellus.txt'):
if line.startswith('Sequence '):
seqname = line[9] # A or B or C
if seqname not in seqname2file:
filename = 'outputfileforsequence_%s.txt' % seqname
seqname2file[seqname] = open(filename, 'w')
currout = seqname2file[seqname]
currout.write(line)
for f in seqname2file.values():
f.close()
应该让你非常接近 - 如果你想要三个单独的文件(A,B和C各一个),其中包含输入文件中的所有行,它只是完成了,除了你可能需要更好的文件名(但你不要让我们知道那些可能是什么的秘密;-),否则一些调整应该在那里。
顺便说一句,如果您还提供了您想要输入数据示例的输出结果示例,它总是有助于(更有效地帮助您而不是在黑暗和猜测中绊脚)! - )答案 1 :(得分:0)
我不确定你想要的输出是什么,但听起来你需要这样的东西:
#!/usr/bin/python
# Open the input file
fhIn = open("input_file.txt", "r")
# Open the output files and store their handles in a dictionary
fhOut = {}
fhOut['A'] = open("sequence_a.txt", "w")
fhOut['B'] = open("sequence_b.txt", "w")
fhOut['C'] = open("sequence_c.txt", "w")
# Create a regexp to find the line naming the sequence
Matcher = re.compile(r'^Sequence (?P<sequence>[A-C])')
# Iterate through each line in the file
CurrentSequence = None
for line in fhIn:
# If the line is a sequence identifier...
m = Matcher.match(line)
if m is not None:
# Select the appropriate sequence from the regexp match
CurrentSequence = m.group('sequence')
# Uncomment the following two lines to skip blank lines
# elif len(line.strip()) == 0:
# pass
# Print out the line to the current sequence output file
# (change to else if you don't want to print the sequence titles)
if CurrentSequence is not None:
fhOut[CurrentSequence].write(line)
# Close all the file handles
fhIn.close()
fhOut['A'].close()
fhOut['B'].close()
fhOut['C'].close()
虽然完全未经测试......