Question

import urllib2
from BeautifulSoup import BeautifulSoup

contenturl = "http://espnfc.com/tables/_/league/esp.1/spanish-la-liga?cc=5901"
soup = BeautifulSoup(urllib2.urlopen(contenturl).read())

table = soup.find('div id', attrs={'class': 'content'})

rows = soup.findAll('tr')
for tr in rows:
    cols = tr.findAll('td')
    for td in cols:
        text = td.find(text=True)
        print text,  
    print

我得到:(注意这只是我所寻找的一点点，这是足球联赛的排名）

&nbsp; Overall None Home None Away None &nbsp;
POS None TEAM P W D L F A None W D L F A None W D L F A None GD Pts
1 
Barcelona 38 32 4 2 115 40 None 18 1 0 63 15 None 14 3

我的问题是，为什么每个单词后面都有“无”？有没有办法让它停止这样做？

Answer 1

如果您在网站上发现，某些信息之间会有空格，这包含在每个td中。

您可能会注意到所有空格都有宽度。所以，你可以这样做：

cols = tr.findAll('td', width=None)

如果您决定在任何阶段交换到BeautifulSoup 4，请使用：

cols = tr.findAll('td', width=False)

Answer 2

当一个元素有多个像The Docs

中所说的孩子时，会发生无

摆脱None的最简单方法是：

for tr in rows:
    cols = tr.findAll('td')
    for td in cols:
        text = td.find(text=True)
        if text is not None:
            print text,  
    print

将检查是否text = None以及是否它不打印

为什么这个BeautifulSoup代码输出“无”？

2 个答案: