Question

我正在努力从网站（http://sports.yahoo.com/nfl/players/8800/）抓取数据，并且我正在使用urllib2和BeautifulSoup。我的代码目前看起来像这样：

site=  'http://sports.yahoo.com/nfl/players/8800/'
response = urllib2.urlopen(site)
html = response.read()
soup = BeautifulSoup(html)
rushing=[]
passing=[]
receiving=[]

#here is where my problem arises
for elem in soup.find_all('th', text=re.compile('2008')):
        passing = elem.parent.find_all('td', class_=re.compile('10'))
        rushing = elem.parent.find_all('td', class_=re.compile('20'))
        receiving = elem.parent.find_all('td', class_=re.compile('30'))

有三个实例，这个页面上存在soup.find_all（...'2008'））部分，当分别打印该部分时，每个部分都会出现。但是，运行此for循环只运行一次循环。如何确保循环运行三次？

Answer 1

据我了解，您需要extend()在循环之前定义的列表：

rushing = []
passing = []
receiving = []

for elem in soup.find_all('th', text=re.compile('2008')):
    passing.extend([td.text for td in elem.parent.find_all('td', class_=re.compile('10'))])
    rushing.extend([td.text for td in elem.parent.find_all('td', class_=re.compile('20'))])
    receiving.extend([td.text for td in elem.parent.find_all('td', class_=re.compile('30'))])

print passing
print rushing
print receiving

打印：

[u'3']
[u'19', u'58', u'14.5', u'3.1', u'0']
[u'2', u'17', u'4.3', u'8.5', u'11', u'6.5', u'0']

迭代一个beautifulsoup结果集python

1 个答案: