使用BeautifulSoup在标签内阅读文本时略读一些行

时间:2015-06-08 19:40:14

标签: python html python-2.7 web-scraping beautifulsoup

我在 div 标记中有这个文本块,但我不需要文本的前10行和最后10行。如何在不创建任何临时文件的情况下执行此操作? 目前,我阅读和编写内容的简单代码是:

soup = BeautifulSoup(r.text)
x = soup.find("div", {"class": "content"})
x = x.text
f = open('test.txt', 'wb')  
f.write(x.encode('utf-8'))

HTML代码中的文本块:

<div class="content">
<div class="heading">result</div>
<p class="sub-heading2">CSE</p>
<div class="content" style="font-family:courier">
UNIVERSITY&nbsp;<br />
<br />
GRADE&nbsp;SHEET&nbsp;-&nbsp;NOV/DEC&nbsp;2014&nbsp;EXAMINATIONS.&nbsp;&nbsp;<br />
<br />
Subject&nbsp;Code&nbsp;&nbsp;:&nbsp;CSE504<br />
Subject&nbsp;Title&nbsp;:&nbsp;SOFTWARE&nbsp;ENGINEERING<br />
Subject&nbsp;Credit&nbsp;:&nbsp;4.0&nbsp;<br />
<br />

REGNO       INT &nbsp;&nbsp;&nbsp;UM        TOT GRADE<br />
&nbsp;2037  13.30   &nbsp;&nbsp;&nbsp;AB    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;0.00    I<br />
2029    15.40   &nbsp;&nbsp;&nbsp;10    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;26.00 F<br />
&nbsp;2018  19.90   &nbsp;29.5  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;50.00 D<br />
&nbsp;2020  22.60   &nbsp;&nbsp;&nbsp;30    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;53.00 C<br />
&nbsp;2029  26.40   &nbsp;18.5  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;45.00 

No.&nbsp;of&nbsp;Malpractices&nbsp;=&nbsp;0<br />
No.&nbsp;of&nbsp;Detentions&nbsp;=&nbsp;1<br />
No.&nbsp;of&nbsp;NA&nbsp;=&nbsp;0<br />
No.&nbsp;of&nbsp;students&nbsp;appeared&nbsp;=&nbsp;113<br />
</div></div></div>

我只需要代码中间的表格,即REGNO INT TOT GRADE表格。

0 个答案:

没有答案