python,如果line.startswith(“ word”)检查第20行并打印

时间:2019-10-31 14:02:54

标签: python

我正在尝试查看文件。如果该行以“ SegID”开头,那么我想看一下其后的第21行,如果该行以“ Cytoplasmic”以外的其他内容开头,我想编写以SegID开头的该行以及以“ Cytoplasmic”以外的任何内容开头的行”到文件。

到目前为止,我有这个:

import sys
import argparse
import operator
import re
import itertools

def main (argv):
    parser = argparse.ArgumentParser(description='find a location')
    parser.add_argument('infile', help='file to process')
    parser.add_argument('outfile', help='file to produce')
    args = parser.parse_args()
    tag = "SeqID:"
    tag2 = "Cytoplasmic"

    with open(args.infile, "r") as f,open(args.outfile,"w+") as of:
        file_in = f.readlines()
        for line in file_in:
            if line.startswith(tag)and line[21:] != "Cytoplasmic":
                 of.write(line)
if __name__ == "__main__":
   main(sys.arg

以下是输入文件的示例:

SeqID: YP_008914846.1 opacity protein [Neisseria gonorrhoeae FA 1090]
  Analysis Report:
    CMSVM-            Unknown                       [No details]
    CytoSVM-          Unknown                       [No details]
    ECSVM-            Unknown                       [No details]
    ModHMM-           Unknown                       [No internal helices found]
    Motif-            Unknown                       [No motifs found]
    OMPMotif-         Unknown                       [No motifs found]
    OMSVM-            OuterMembrane                 [No details]
    PPSVM-            Unknown                       [No details]
    Profile-          Unknown                       [No matches to profiles found]
    SCL-BLAST-        OuterMembrane                 [matched 60392864: Opacity protein opA54 precursor]
    SCL-BLASTe-       Unknown                       [No matches against database]
    Signal-           Unknown                       [No signal peptide detected]
  Localisation Scores:
    OuterMembrane          10.00
    Extracellular          0.00
    Periplasmic            0.00
    Cytoplasmic            0.00
    CytoplasmicMembrane    0.00
  Final Prediction:
    OuterMembrane          10.00

-------------------------------------------------------------------------------

SeqID: YP_008914847.1 hypothetical protein NGO0146a [Neisseria gonorrhoeae FA 1090]
  Analysis Report:
    CMSVM-            Unknown                       [No details]
    CytoSVM-          Unknown                       [No details]
    ECSVM-            Unknown                       [No details]
    ModHMM-           Unknown                       [No internal helices found]
    Motif-            Unknown                       [No motifs found]
    OMPMotif-         Unknown                       [No motifs found]
    OMSVM-            Unknown                       [No details]
    PPSVM-            Unknown                       [No details]
    Profile-          Unknown                       [No matches to profiles found]
    SCL-BLAST-        Unknown                       [No matches against database]
    SCL-BLASTe-       Unknown                       [No matches against database]
    Signal-           Unknown                       [No signal peptide detected]
  Localization Scores:
    CytoplasmicMembrane    2.00
    Cytoplasmic            2.00
    OuterMembrane          2.00
    Periplasmic            2.00
    Extracellular          2.00
  Final Prediction:
    Unknown

2 个答案:

答案 0 :(得分:1)

我的Python有点生锈,所以请原谅。我希望我可以正确地推断出所需的输出,否则请发表评论。

这假设您测序实验中的样本始终被3行任意内容的偏移分开,每个样本有22行。

Init()

答案 1 :(得分:1)

您可以尝试使用以下内容:

    with open('credentials.json', "r") as f:
        file_in = f.readlines()
        for i,line in enumerate(file_in):

            if line.startswith(tag) and \
                    (i+21)< len(file_in) and \ 
                    not(file_in[i+21].strip().startswith("Cytoplasmic")):
                of.write(line)
                of.write(file_in[i+21])