数据格式:
<tr><td>Modu</td><td><span class="comments">90</span></td></tr>
<tr><td>Kenzie</td><td><span class="comments">88</span></td></tr>
我想只得到90,然后是88,依此类推。 我是怎么试的:
#2.7 version python
#link I used as input: http://python-data.dr-chuck.net/comments_283660.html
import urllib
from BeautifulSoup import *
url = raw_input('Enter - ')
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html)
r=0;
t=0
tags = soup('span')
for tag in tags:
#print tag.get('class', None)
#print tag.get('class="comments">', None)
print 'Contents:',tag.contents
输出是:
Contents: [u'100']
Contents: [u'100']
Contents: [u'97']
Contents: [u'95']
....
如何避免“你”并且只获得100,100,97,95 ......
答案 0 :(得分:1)
您可以索引内容 列表 print 'Contents:',tag.contents[0]
或更好,只是为了从td中提取文字:
tags = soup('span')
for tag in tags:
print('Contents:',tag.text)
使用您的链接会给您:
('Contents:', u'100')
('Contents:', u'100')
('Contents:', u'97')
('Contents:', u'95')
('Contents:', u'95')
('Contents:', u'94')
('Contents:', u'93')
('Contents:', u'92')
('Contents:', u'84')
('Contents:', u'78')
('Contents:', u'78')
('Contents:', u'76')
('Contents:', u'69')
('Contents:', u'64')
('Contents:', u'60')
('Contents:', u'58')
('Contents:', u'53')
('Contents:', u'51')
('Contents:', u'49')
('Contents:', u'49')
('Contents:', u'45')
('Contents:', u'45')
('Contents:', u'45')
('Contents:', u'44')
('Contents:', u'39')
('Contents:', u'38')
('Contents:', u'37')
('Contents:', u'35')
('Contents:', u'34')
('Contents:', u'33')
('Contents:', u'32')
('Contents:', u'32')
('Contents:', u'30')
('Contents:', u'29')
('Contents:', u'28')
('Contents:', u'27')
('Contents:', u'21')
('Contents:', u'19')
('Contents:', u'16')
('Contents:', u'16')
('Contents:', u'15')
('Contents:', u'13')
('Contents:', u'13')
('Contents:', u'12')
('Contents:', u'11')
('Contents:', u'9')
('Contents:', u'6')
('Contents:', u'2')
('Contents:', u'1')
('Contents:', u'1')
u
只是意味着您拥有 unicode 字符串,如果您确实要删除它,可以调用str(tag.text))
,或者如果您想要整数,则需要调用int(tag.text))
1}}。我还建议你升级到 bs4 。