我正在努力从网站(http://sports.yahoo.com/nfl/players/8800/)抓取数据,并且我正在使用urllib2和BeautifulSoup。我的代码目前看起来像这样:
site= 'http://sports.yahoo.com/nfl/players/8800/'
response = urllib2.urlopen(site)
html = response.read()
soup = BeautifulSoup(html)
rushing=[]
passing=[]
receiving=[]
#here is where my problem arises
for elem in soup.find_all('th', text=re.compile('2008')):
passing = elem.parent.find_all('td', class_=re.compile('10'))
rushing = elem.parent.find_all('td', class_=re.compile('20'))
receiving = elem.parent.find_all('td', class_=re.compile('30'))
有三个实例,这个页面上存在soup.find_all(...'2008'))部分,当分别打印该部分时,每个部分都会出现。但是,运行此for循环只运行一次循环。如何确保循环运行三次?
答案 0 :(得分:1)
据我了解,您需要extend()
在循环之前定义的列表:
rushing = []
passing = []
receiving = []
for elem in soup.find_all('th', text=re.compile('2008')):
passing.extend([td.text for td in elem.parent.find_all('td', class_=re.compile('10'))])
rushing.extend([td.text for td in elem.parent.find_all('td', class_=re.compile('20'))])
receiving.extend([td.text for td in elem.parent.find_all('td', class_=re.compile('30'))])
print passing
print rushing
print receiving
打印:
[u'3']
[u'19', u'58', u'14.5', u'3.1', u'0']
[u'2', u'17', u'4.3', u'8.5', u'11', u'6.5', u'0']