有多个div类,它们具有相同的类名,但ID不同:
<div class ="starting-lineups__matchup" data-gamepk="******">
我能够在这些类中抓取所需的数据,但我一直不得不检查页面以找出data-gamepk的值。有没有办法刮这个号码?
这是我要抓取的网站,下面是我的代码:
https://www.mlb.com/starting-lineups
#main table that contains the data
gamelist = soup.find('div',attrs={'class':'starting-lineups__container-multi'})
user = input()
#game specific data
game = gamelist.find('div',attrs={'data-gamepk':user})
#loop through away team name
for teams in game.find_all('span',attrs={'class':'starting-lineups__team-name--away'}):
for team_a in teams.find_all("a"):
print(team_a.text)
因此,所有名为'starting-lineups__matchup'
的类都在'starting-lineups__container-multi
'类之内。所有'starting-lineups__matchup'
类都有一个与之关联的数字。用户手动输入此数字可将特定类别中的数据抓取。上面的代码中,哪一个只是客队名称starting-lineups__team-name--away
。
为了找到每个游戏的编号,我一直在检查网页。不必亲自访问该网站并亲自浏览html,我想抓取该数字以及与之相关的团队名称。
答案 0 :(得分:0)
您也许可以使用此功能:
from bs4 import BeautifulSoup import re
soup= BeautifulSoup(html.text,'lxml')
results = soup.findAll("div", {"data-gamepk" : re.compile(r".*")})
这将列出每个具有“ data-gamepk”属性的div。
PS:使用True代替re.compile(r“。*”)可能也可以。
答案 1 :(得分:0)
希望我能正确理解您的问题:此脚本将打印游戏编号和主队/客队名称:
import requests
from bs4 import BeautifulSoup
url = 'https://www.mlb.com/starting-lineups'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
for game in soup.select('[data-gamepk]'):
print(game['data-gamepk'])
print(game.select_one('.starting-lineups__team-name--away').get_text(strip=True))
print(game.select_one('.starting-lineups__team-name--home').get_text(strip=True))
print('-' * 80)
打印:
631112
Cubs
Pirates
--------------------------------------------------------------------------------
631432
Rangers
Astros
--------------------------------------------------------------------------------
631146
Nationals
Phillies
--------------------------------------------------------------------------------
631234
Yankees
Mets
--------------------------------------------------------------------------------
631368
Padres
Angels
--------------------------------------------------------------------------------
631614
Blue Jays
Red Sox
--------------------------------------------------------------------------------
631405
White Sox
Royals
--------------------------------------------------------------------------------
631370
D-backs
Dodgers
--------------------------------------------------------------------------------
631055
Athletics
Mariners
--------------------------------------------------------------------------------