我使用BS4在Python中编写一个小刮刀,以便从ESPN.com获取MLB计划数据
它差不多完了,但我在这里遇到了一个问题:
<div class="teams" data-behavior="fix_broken_images"><a name="&lpos=mlb:schedule:team" href="/mlb/team/_/name/kc"><img src="http://a.espncdn.com/combiner/i?img=/i/teamlogos/mlb/500/scoreboard/kc.png&h=50" class="schedule-team-logo"></a></div><a name="&lpos=mlb:schedule:team" class="team-name" href="/mlb/team/_/name/kc"><span>Kansas City</span> <abbr title="Kansas City Royals">KC</abbr></a>
我实际上可以阅读<span> </span>
内容,但我想在<abbr title>
不知道我错过了什么,我还没弄明白怎么做
谢谢!
答案 0 :(得分:3)
对于您的代码段,您需要锚中 abbr 标记中的title属性team-name
:
h = """<div class="teams" data-behavior="fix_broken_images"><a name="&lpos=mlb:schedule:team" href="/mlb/team/_/name/kc"><img src="http://a.espncdn.com/combiner/i?img=/i/teamlogos/mlb/500/scoreboard/kc.png&h=50" class="schedule-team-logo"></a></div><a name="&lpos=mlb:schedule:team" class="team-name" href="/mlb/team/_/name/kc"><span>Kansas City</span> <abbr title="Kansas City Royals">KC</abbr></a>"""
soup = BeautifulSoup(h)
print(soup.select_one("a.team-name abbr")["title"])
这给了你:
Kansas City Royals
或使用find:
h = """<div class="teams" data-behavior="fix_broken_images"><a name="&lpos=mlb:schedule:team" href="/mlb/team/_/name/kc"><img src="http://a.espncdn.com/combiner/i?img=/i/teamlogos/mlb/500/scoreboard/kc.png&h=50" class="schedule-team-logo"></a></div><a name="&lpos=mlb:schedule:team" class="team-name" href="/mlb/team/_/name/kc"><span>Kansas City</span> <abbr title="Kansas City Royals">KC</abbr></a>"""
soup = BeautifulSoup(h)
print(soup.find("a", attrs={"class":"team-name"}).abbr["title"])
这将获得该网站的所有名称:
from bs4 import BeautifulSoup
import requests
url = "http://espn.go.com/mlb/schedule"
soup = BeautifulSoup(requests.get(url).content)
table = soup.select_one("table.schedule.has-team-logos")
print([a["title"] for a in table.select("a.team-name abbr")])
输出:
['Detroit Tigers', 'Washington Nationals', 'Kansas City Royals', 'New York Yankees', 'Oakland Athletics', 'Boston Red Sox', 'Pittsburgh Pirates', 'Cincinnati Reds', 'Milwaukee Brewers', 'Miami Marlins', 'Chicago White Sox', 'Texas Rangers', 'San Diego Padres', 'Chicago Cubs', 'Baltimore Orioles', 'Minnesota Twins', 'Cleveland Indians', 'Houston Astros', 'Arizona Diamondbacks', 'Colorado Rockies', 'Tampa Bay Rays', 'Seattle Mariners', 'New York Mets', 'Los Angeles Dodgers', 'Toronto Blue Jays', 'San Francisco Giants']