即使有一个循环,刮刀也只打印第一行

时间:2014-02-04 22:52:12

标签: python web-scraping beautifulsoup

有一些问题,能够从该网址刮掉整个桌子。它只能设法刮掉第一排而忽略其余部分,任何人都可以帮助或指出我正确的方向

我的代码是;

    import urllib2
    from BeautifulSoup import BeautifulSoup


    soup = BeautifulSoup(urllib2.urlopen('http://www.live-footballontv.com/live-englishfootball-on-tv.html').read())

    for row in soup('table', {'class': 'gridtable'})[0].tbody('tr')[1:]:
        tds = row('td')

        print tds[0].string, tds[1].string, tds[2].string, tds[3].string, tds[4].string,

继承错误;

      Tue 4th Feb Fulham v Sheffield United  FA Cup 4th Round Replay 19:45           ITV4 / ITV4 HD
      Traceback (most recent call last):
      File "C:/Users/owner/PycharmProjects/Football TV Guide App/TVGuide.py", line 11,   in <module>
      print tds[0].string, tds[1].string, tds[2].string, tds[3].string, tds[4].string,  ths[0].string
      IndexError: list index out of ran

1 个答案:

答案 0 :(得分:1)

试试这个

soup = BeautifulSoup(urllib2.urlopen('http://www.live-footballontv.com/live-english-  football-on-tv.html').read())
    for row in soup('table', {'class': 'gridtable'})[0].tbody('tr'):
        ths = row('th')
        for th in ths:
            print th.string,
            print ',',
        tds = row('td')
        for td in tds:
            print td.string,
            print ',',
        print