迭代一个beautifulsoup结果集python

时间:2015-02-08 04:35:32

标签: python html python-2.7 beautifulsoup html-parsing

我正在努力从网站(http://sports.yahoo.com/nfl/players/8800/)抓取数据,并且我正在使用urllib2和BeautifulSoup。我的代码目前看起来像这样:

site=  'http://sports.yahoo.com/nfl/players/8800/'
response = urllib2.urlopen(site)
html = response.read()
soup = BeautifulSoup(html)
rushing=[]
passing=[]
receiving=[]

#here is where my problem arises
for elem in soup.find_all('th', text=re.compile('2008')):
        passing = elem.parent.find_all('td', class_=re.compile('10'))
        rushing = elem.parent.find_all('td', class_=re.compile('20'))
        receiving = elem.parent.find_all('td', class_=re.compile('30'))

有三个实例,这个页面上存在soup.find_all(...'2008'))部分,当分别打印该部分时,每个部分都会出现。但是,运行此for循环只运行一次循环。如何确保循环运行三次?

1 个答案:

答案 0 :(得分:1)

据我了解,您需要extend()在循环之前定义的列表:

rushing = []
passing = []
receiving = []

for elem in soup.find_all('th', text=re.compile('2008')):
    passing.extend([td.text for td in elem.parent.find_all('td', class_=re.compile('10'))])
    rushing.extend([td.text for td in elem.parent.find_all('td', class_=re.compile('20'))])
    receiving.extend([td.text for td in elem.parent.find_all('td', class_=re.compile('30'))])

print passing
print rushing
print receiving

打印:

[u'3']
[u'19', u'58', u'14.5', u'3.1', u'0']
[u'2', u'17', u'4.3', u'8.5', u'11', u'6.5', u'0']