有一些问题,能够从该网址刮掉整个桌子。它只能设法刮掉第一排而忽略其余部分,任何人都可以帮助或指出我正确的方向
我的代码是;
import urllib2
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(urllib2.urlopen('http://www.live-footballontv.com/live-englishfootball-on-tv.html').read())
for row in soup('table', {'class': 'gridtable'})[0].tbody('tr')[1:]:
tds = row('td')
print tds[0].string, tds[1].string, tds[2].string, tds[3].string, tds[4].string,
继承错误;
Tue 4th Feb Fulham v Sheffield United FA Cup 4th Round Replay 19:45 ITV4 / ITV4 HD
Traceback (most recent call last):
File "C:/Users/owner/PycharmProjects/Football TV Guide App/TVGuide.py", line 11, in <module>
print tds[0].string, tds[1].string, tds[2].string, tds[3].string, tds[4].string, ths[0].string
IndexError: list index out of ran
答案 0 :(得分:1)
试试这个
soup = BeautifulSoup(urllib2.urlopen('http://www.live-footballontv.com/live-english- football-on-tv.html').read())
for row in soup('table', {'class': 'gridtable'})[0].tbody('tr'):
ths = row('th')
for th in ths:
print th.string,
print ',',
tds = row('td')
for td in tds:
print td.string,
print ',',
print