我有几个文本文件,其格式如下
Technical :
localization lengths is observed at particular energies for an increasing binary backbone disorder. We comment on the possible biological relevance of sequence-dependent charge transfer in DNA
Work :
We find that random and λ-DNA have localization lengths allowing for electron motion among a few dozen basepairs only.
Technical :
We study the electronic properties of DNA by way of a tight-binding model applied to four particular DNA sequences. The charge transfer properties are presented in terms of localization lengths (crudely speaking, the length over which electrons travel.
Education :
Electronic, DNA sequence
现在我想用标题" Technical"提取段落。使用我的代码我可以在两个标题之间提取特定段落,但不能提取具有相似标题的所有段落。
with open("aks.txt") as infile, open("fffm",'w') as outfile:
copy = False
for line in infile:
if line.strip() == "Technical":
copy = True
elif line.strip() == "Work":
copy = False
elif copy:
outfile.write(line)
fh = open("fffm.txt", 'r')
contents = fh.read()
len(contents)
答案 0 :(得分:0)
将正则表达式与 re
模块一起使用。请参阅:https://docs.python.org/2/library/re.html
此代码可以满足您的需求:
import re
the_text = """Technical :
localization lengths is observed at particular energies for an increasing binary backbone disorder. We comment on the possible biological relevance of sequence-dependent charge transfer in DNA
Work :
We find that random and λ-DNA have localization lengths allowing for electron motion among a few dozen basepairs only.
Technical :
We study the electronic properties of DNA by way of a tight-binding model applied to four particular DNA sequences. The charge transfer properties are presented in terms of localization lengths (crudely speaking, the length over which electrons travel.
Education :
Electronic, DNA sequence"""
for title, content in re.findall('(\w+) +?:\s+?(.+)', the_text):
if title.lower() == "technical":
print "Title: {}".format(title)
print "Content: {}\n".format(content)
<强>输出:强>
Title: Technical
Content: localization lengths is observed at particular energies for an increasing binary backbone disorder. We comment on the possible biological relevance of sequence-dependent charge transfer in DNA
Title: Technical
Content: We study the electronic properties of DNA by way of a tight-binding model applied to four particular DNA sequences. The charge transfer properties are presented in terms of localization lengths (crudely speaking, the length over which electrons travel.