Python - 从文件中获取特定行

时间:2016-02-20 13:06:59

标签: python

如何从Python中的文件中获取特定行?我知道如何读取文件并将其列入列表等,但这对我来说有点困难。让我解释一下我的需求:

我有一个看起来像这样的文件:

  

lcl | AF033819.3_cds_AAC82593.1_1 [gene = gag] [protein = Gag] [protein_id = AAC82593.1] [location = 336 ... 1838]   ATGGGTGCGAGAGCGTCAGTATTAAGCGGGGGAGAATTAGATCGATGGGAAAAAATTCGGTTAAGGCCAG   GGGGAAAGAAAAAATATAAATTAAAACATATAGTATGGGCAAGCAGGGAGCTAGAACGATTCGCAGTTAA   TCACTCTTTGGCAACGACCCCTCGTCACAATAA   lcl | AF033819.3_cds_AAC82598.2_2 [基因= pol] [蛋白质= Pol] [partial = 5'] [protein_id = AAC82598.2] [location =< 1631..4642]   TTTTTTAGGGAAGATCTGGCCTTCCTACAAGGGAAGGCCAGGGAATTTTCTTCAGAGCAGACCAGAGCCA   ACAGCCCCACCAGAAGAGAGCTTCAGGTCTGGGGTAGAGACAACAACTCCCCCTCAGAAGCAGGAGCCGA
  lcl | AF033819.3_cds_AAC82594.1_3 [gene = vif] [protein = Vif] [protein_id = AAC82594.1] [location = 4587..5165]   ATGGAAAACAGATGGCAGGTGATGATTGTGTGGCAAGTAGACAGGATGAGGATTAGAACATGGAAAAGTT   TAGTAAAACACCATATGTATGTTTCAGGGAAAGCTAGGGGATGGTTTTATAGACATCACTATGAAAGCCC

我需要删除包含以下内容的每一行:

  

lcl | AF033819.3_cds_AAC82594.1_3 [gene = vif] [protein = Vif] [protein_id = AAC82594.1] [location = 4587..5165]

我需要存储在列表,文件等中的所有字母。我知道它是如何工作的。任何人都可以帮助我使用Python中的代码吗?我如何只删除包含以下内容的行:

  

LCL

2 个答案:

答案 0 :(得分:0)

答案是使用regular expressions。它将是这样的:

>>> import re
>>> a = 'beginlcl|AF033819.3_cds_AAC82593.1_1 [gene=gag] [protein=Gag] [protein_id=AAC82593.1] [location=336..1838]end'
>>> re.sub('lcl.*?location.*?\]', '', a)
'beginend'

答案 1 :(得分:0)

为什么不使用startswith()

with open('lcl.txt', 'r') as f:
    for line in f.readlines():
        if line.startswith("lcl|"):
            print ("lcl line dropping it")
            continue
        else:
            print (line)

结果:

lcl line dropping it
ATGGGTGCGAGAGCGTCAGTATTAAGCGGGGGAGAATTAGATCGATGGGAAAAAATTCGGTTAAGGCCAG GGGGAAAGAAAAAATATAAATTAAAACATATAGTATGGGCAAGCAGGGAGCTAGAACGATTCGCAGTTAATCACTCTTTGGCAACGACCCCTCGTCACAATAA

lcl line dropping it
TTTTTTAGGGAAGATCTGGCCTTCCTACAAGGGAAGGCCAGGGAATTTTCTTCAGAGCAGACCAGAGCCA ACAGCCCCACCAGAAGAGAGCTTCAGGTCTGGGGTAGAGACAACAACTCCCCCTCAGAAGCAGGAGCCGA

lcl line dropping it
ATGGAAAACAGATGGCAGGTGATGATTGTGTGGCAAGTAGACAGGATGAGGATTAGAACATGGAAAAGTT TAGTAAAACACCATATGTATGTTTCAGGGAAAGCTAGGGGATGGTTTTATAGACATCACTATGAAAGCCC

注意:我假设这里的正确位置有新行!