我有一些代码可以从页面http://sports.yahoo.com/nhl/scoreboard?d=2013-04-01输出团队及其所有分数值(不含空格)。
from bs4 import BeautifulSoup
from urllib.request import urlopen
url = urlopen("http://sports.yahoo.com/nhl/scoreboard?d=2013-04-01")
content = url.read()
soup = BeautifulSoup(content)
listnames = ''
listscores = ''
for table in soup.find_all('table', class_='scores'):
for row in table.find_all('tr'):
for cell in row.find_all('td', class_='yspscores'):
if cell.text.isdigit():
listscores += cell.text
for cell in row.find_all('td', class_='yspscores team'):
listnames += cell.text
print (listnames)
print (listscores)
我无法解决的问题是我不太明白Python如何使用任何提取的信息并以正确的整数值给出正确的整数值:
Team X: 1, 5, 11.
网站的问题是所有分数属于同一类;所有表都在同一个类下。唯一不同的是href。
答案 0 :(得分:0)
如果要将值与名称相关联,通常可以使用dict
。以下是对代码的修改,以说明原则:
from bs4 import BeautifulSoup
from urllib.request import urlopen
url = urlopen('http://sports.yahoo.com/nhl/scoreboard?d=2013-04-01')
content = url.read()
soup = BeautifulSoup(content)
results = {}
for table in soup.find_all('table', class_='scores'):
for row in table.find_all('tr'):
scores = []
name = None
for cell in row.find_all('td', class_='yspscores'):
link = cell.find('a')
if link:
name = link.text
elif cell.text.isdigit():
scores.append(cell.text)
if name is not None:
results[name] = scores
for name, scores in results.items():
print('%s: %s' % (name, ', '.join(scores)))
...运行时给出此输出:
$ python3 get_scores.py
St. Louis: 1, 2, 1
San Jose: 0, 3, 0
Colorado: 0, 0, 2
Dallas: 0, 0, 0
New Jersey: 0, 1, 0
NY Islanders: 2, 0, 1
Nashville: 0, 0, 2, 0
Minnesota: 0, 1, 0
Detroit: 1, 2, 0
NY Rangers: 1, 1, 2
Anaheim: 0, 3, 1
Winnipeg: 2, 0, 0
Chicago: 1, 1, 0, 0
Calgary: 0, 0, 1
Vancouver: 0, 1, 1
Edmonton: 3, 0, 1
Montreal: 1, 1, 2
Carolina: 1, 0, 0
除了使用字典之外,另一个重要的变化是我们现在正在检查是否存在a
元素来获取团队的名称,而不是另外的team
类。这真的是一种风格选择,但对我而言,代码似乎更具表现力。