如何从Python中的文件中获取特定行?我知道如何读取文件并将其列入列表等,但这对我来说有点困难。让我解释一下我的需求:
我有一个看起来像这样的文件:
lcl | AF033819.3_cds_AAC82593.1_1 [gene = gag] [protein = Gag] [protein_id = AAC82593.1] [location = 336 ... 1838] ATGGGTGCGAGAGCGTCAGTATTAAGCGGGGGAGAATTAGATCGATGGGAAAAAATTCGGTTAAGGCCAG GGGGAAAGAAAAAATATAAATTAAAACATATAGTATGGGCAAGCAGGGAGCTAGAACGATTCGCAGTTAA TCACTCTTTGGCAACGACCCCTCGTCACAATAA lcl | AF033819.3_cds_AAC82598.2_2 [基因= pol] [蛋白质= Pol] [partial = 5'] [protein_id = AAC82598.2] [location =< 1631..4642] TTTTTTAGGGAAGATCTGGCCTTCCTACAAGGGAAGGCCAGGGAATTTTCTTCAGAGCAGACCAGAGCCA ACAGCCCCACCAGAAGAGAGCTTCAGGTCTGGGGTAGAGACAACAACTCCCCCTCAGAAGCAGGAGCCGA
lcl | AF033819.3_cds_AAC82594.1_3 [gene = vif] [protein = Vif] [protein_id = AAC82594.1] [location = 4587..5165] ATGGAAAACAGATGGCAGGTGATGATTGTGTGGCAAGTAGACAGGATGAGGATTAGAACATGGAAAAGTT TAGTAAAACACCATATGTATGTTTCAGGGAAAGCTAGGGGATGGTTTTATAGACATCACTATGAAAGCCC
我需要删除包含以下内容的每一行:
lcl | AF033819.3_cds_AAC82594.1_3 [gene = vif] [protein = Vif] [protein_id = AAC82594.1] [location = 4587..5165]
我需要存储在列表,文件等中的所有字母。我知道它是如何工作的。任何人都可以帮助我使用Python中的代码吗?我如何只删除包含以下内容的行:
LCL
答案 0 :(得分:0)
答案是使用regular expressions。它将是这样的:
>>> import re
>>> a = 'beginlcl|AF033819.3_cds_AAC82593.1_1 [gene=gag] [protein=Gag] [protein_id=AAC82593.1] [location=336..1838]end'
>>> re.sub('lcl.*?location.*?\]', '', a)
'beginend'
答案 1 :(得分:0)
为什么不使用startswith()
?
with open('lcl.txt', 'r') as f:
for line in f.readlines():
if line.startswith("lcl|"):
print ("lcl line dropping it")
continue
else:
print (line)
结果:
lcl line dropping it
ATGGGTGCGAGAGCGTCAGTATTAAGCGGGGGAGAATTAGATCGATGGGAAAAAATTCGGTTAAGGCCAG GGGGAAAGAAAAAATATAAATTAAAACATATAGTATGGGCAAGCAGGGAGCTAGAACGATTCGCAGTTAATCACTCTTTGGCAACGACCCCTCGTCACAATAA
lcl line dropping it
TTTTTTAGGGAAGATCTGGCCTTCCTACAAGGGAAGGCCAGGGAATTTTCTTCAGAGCAGACCAGAGCCA ACAGCCCCACCAGAAGAGAGCTTCAGGTCTGGGGTAGAGACAACAACTCCCCCTCAGAAGCAGGAGCCGA
lcl line dropping it
ATGGAAAACAGATGGCAGGTGATGATTGTGTGGCAAGTAGACAGGATGAGGATTAGAACATGGAAAAGTT TAGTAAAACACCATATGTATGTTTCAGGGAAAGCTAGGGGATGGTTTTATAGACATCACTATGAAAGCCC
注意:我假设这里的正确位置有新行!