Question

我有一个如下文件。

序列A.1.1细菌
  ATGCGCGATATAGGCCT
  ATTATGCGCGCGCGC

序列A.1.2病毒
  ATATATGCGCCGCGCGTA
  ATATATATGCGCGCCGGC

序列B.1.21黑猩猩
  ATATAGCGCGCGCGCGAT
  ATATATATGCGCG

序列C.21.4人类   ATATATATGCCGCGCG
  ATATAATATC

我想从一个文件为A，B和C类序列创建单独的文件。请提供一些破解此代码的阅读材料。谢谢。输出应该是三个文件，一个用于'A'，第二个文件用于带有'B'的序列，第三个文件用于带有'C'的序列。

Answer 1

不是100％清楚你想做什么，而是像：

currout = None
seqname2file = dict()

for line in open('thefilewhosenameyoudonottellus.txt'):
  if line.startswith('Sequence '):    
    seqname = line[9]  # A or B or C
    if seqname not in seqname2file:
      filename = 'outputfileforsequence_%s.txt' % seqname
      seqname2file[seqname] = open(filename, 'w')
    currout = seqname2file[seqname]
  currout.write(line)

for f in seqname2file.values():
  f.close()

应该让你非常接近 - 如果你想要三个单独的文件（A，B和C各一个），其中包含输入文件中的所有行，它只是完成了，除了你可能需要更好的文件名（但你不要让我们知道那些可能是什么的秘密;-)，否则一些调整应该在那里。

顺便说一句，如果您还提供了您想要输入数据示例的输出结果示例，它总是有助于（更有效地帮助您而不是在黑暗和猜测中绊脚）！ - ）

Answer 2

我不确定你想要的输出是什么，但听起来你需要这样的东西：

#!/usr/bin/python

# Open the input file
fhIn = open("input_file.txt", "r")

# Open the output files and store their handles in a dictionary
fhOut = {}
fhOut['A'] = open("sequence_a.txt", "w")
fhOut['B'] = open("sequence_b.txt", "w")
fhOut['C'] = open("sequence_c.txt", "w")

# Create a regexp to find the line naming the sequence
Matcher = re.compile(r'^Sequence (?P<sequence>[A-C])')

# Iterate through each line in the file
CurrentSequence = None
for line in fhIn:
    # If the line is a sequence identifier...
    m = Matcher.match(line)
    if m is not None:
        # Select the appropriate sequence from the regexp match
        CurrentSequence = m.group('sequence')
    # Uncomment the following two lines to skip blank lines
    # elif len(line.strip()) == 0:
    #     pass
    # Print out the line to the current sequence output file
    # (change to else if you don't want to print the sequence titles)
    if CurrentSequence is not None:
        fhOut[CurrentSequence].write(line)

# Close all the file handles
fhIn.close()
fhOut['A'].close()
fhOut['B'].close()
fhOut['C'].close()

虽然完全未经测试......

在python中从单个文件生成多个文件

2 个答案: