Question

我想抓取一个网址。我在源代码中识别数据，但发现所有结果都不是

要剪贴的网址示例 https://fr.uefa.com/uefaeuropaleague/season=2020/matches/round=2001148/match=2028066/statistics/index.html?iv=true

1个 0 巴克斯·马克斯 0？（homeGoalsScored * 100 /（homeGoalsScored + awayGoalsScored））：0）+'％'，类：'goals-scored-graph-bar graph-bar'+（homeGoalsScored + awayGoalsScored === 0？'graph-bar__zero'：' '）}->


  req = urllib.request.Request(
      link, 
      data=None, 
      headers={
          'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'
      }
      )
    matchs = []
    with urllib.request.urlopen(req) as urlpage:
        html = urlpage.read().decode()
        soup = BeautifulSoup.BeautifulSoup(html,"html.parser")  
        stats = soup.find_all("div",class_='match-statistics--item')

统计为空

我尝试了很多事情作为完整课程

soup.find_all("div",class_='match-statistics--goals-scored stats-visualization--horizontal-bar match-statistics--item')

通过选择

我只想了解所有统计信息像

“首页”，“总计”，21

“离开”，“共计”，6

“首页”，“CADRÉS”，6

'Away'，'CADRÉS'，3 ....

Answer 1

数据是从您可以在“网络”标签中找到的API动态加载的。有不同长度的物品，所以我使用itertools确保在不存在家庭或客运物品的地方不打印任何东西

import requests, itertools

r = requests.get('https://digital-api.uefa.com/v1/matches/2028066/statistics/team?language=FR').json()
home = {i['typeDisplayName']:i['value'] for i in r['homeTeam']['statistics']}
away = {i['typeDisplayName']:i['value'] for i in  r['awayTeam']['statistics']}

for item in itertools.zip_longest(home.keys(), away.keys(), fillvalue=None):
    if item[0]:
        print(', '.join(['Home', item[0], str(home[item[0]])]))
    if item[1]:
        print(', '.join(['Away', item[1], str(away[item[1]])]))

示例输出：

报废uefa网页beautifulsoup

1 个答案: