我正在尝试在字符串"<div><p class='entete_propriete'>DNA sequence </p>"
和"</div>"
之间提取文字:
handle = open(i, 'r')
name = i.split('=')[1]
print name
soup = BeautifulSoup(handle,"lxml")
for item in soup:
seq = soup.findAll(seq)
print seq
<section>
<div><p class='entete_propriete' align='center'>Ends</p>
<br><span class='entete_propriete'>IR Length : </span>44/49<br><br><span class='entete_propriete_bis'>IRL : </span><span class='seq'>GAGGGTCGGCAGGGATTCGTGTAAAACACAGCCAAAAGTGAGCTAACTCC</span><br><span class='entete_propriete_bis'>IRR : </span><span class='seq'>GAGGGTCGACAGGGATTTGTGTAAAAAACAGCCAAAATTGAGCTAAATCT</span><br> </div>
<div><p class='entete_propriete' align='center'>Insertion site</p><br>
<table><tr><th>Left flank</th><th>Direct repeat</th><th>Right flank</th><th>DR Length</th></tr><tr> <td class='seq' align='right'>TCCACTACCT</td><td class='seq' align='center'></td><td class='seq' align='left'>TCGTTGAGCA</td><td class='seq' align='center'>0</td></tr></table> </div>
<div class="piedSection"></div>
</section>
<section>
<div id=seq_ident><p>IS1007</p><ul><li><span class='entete_propriete'>Family </span>IS6</li><li><span class='entete_propriete'>Group </span></li></ul></div><span class='entete_propriete'> MGE type </span>IS<span class='entete_propriete_decal'>Related element(s) : </span><br><span class='entete_propriete'>Isoform </span><span class='entete_propriete_decal'>Synonym(s) </span> <div class="piedSection"></div>
</section>
<div><p class='entete_propriete'>DNA sequence </p>
<div class='seq'>GGCACTGTTGCAAATAGGCTGACATGATAAGCTAAATATCTTATTTATTTCGAGATACAGCAGATGAATCCCTTCCACGGTCGGCACTTTCAAGGTGAAA<br />
GAGAAGTTTGGCTAGTAAATAGAGTTTTCGGTCTCTAAGCTTTTTTGAAGGGAAAATCATTGACTCAGAT<br />
CCCTATTTGCAACAGTGCC </div>
输出是这样的:
IS1007
[]
[]
如果我能理解,我可以删除{\<\br/>
}。
TATCTTATTTATTTCGAGATACAGCAGATGAATCCCTTCCACGGTCGGCACTTTCAAGGTGAAA<br />
GAGAAGTTTGGCTAGTAAATAGAGTTTTCGGTCTCTAAGCTTTTTTGAAGGGAAAATCACTCAG<br />
ATCCCTATTTGCAACAGTGCC </div>
任何提取以下序列的建议:
\<\div\>\<\p class='entete_propriete'>DNA sequence \<\/\p\>
<div class='seq'>
和
\<\div\>
答案 0 :(得分:0)
您可以使用get_text()
方法或text
属性从标记中获取数据
for item in soup:
seq = soup.findAll(seq)
print seq.get_text()
您也可以使用seq.text