我试图使用美丽的汤来解析页面上的特定内容,你能告诉我,我怎么能这样做? 代码:
import re
import pytz
import requests
import datetime
from flask import url_for
from bs4 import BeautifulSoup
from urllib.parse import urljoin
link = "http://www.espncricinfo.com/series/_/id/8038/season/2018/icc-world-cup-qualifiers/"
r = requests.get(link)
bigbash_article_html = r.text
soup = BeautifulSoup(bigbash_article_html, "html.parser")
details = soup.find("div",{"class":"module-list performers"})
bigbash_article_dict = {}
for div in details:
image_div = div.find("div", {"class": "img-container player"})
我不知道如何继续前进,我期待输出如下:
预期产出:
热门得分手:
[{'playerimage':'http://a.espncdn.com/combiner/i?img=/i/headshots/cricket/players/default-player-logo-500.png&h=55&w=40&scale=crop&transparent=true','playername':'TP Ura','player-details':'PNG, Right-hand bat','runs':'188','innings':'2','Average':'94.00'},..............................................................................................}]
另一列相同 顶级检票员:
[{'playerimage':'http://a.espncdn.com/combiner/i?img=/i/headshots/cricket/players/default-player-logo-500.png&h=55&w=40&scale=crop&transparent=true','playername':'Ehsan Khan','player-details':'HKG, Right-arm offbreak','wickets':'9','innings':'3','Average':'12.55'},..............................................................................................}]
答案 0 :(得分:1)
首先,您正在搜索错误的标签。您想要的内容位于<ul class="module-list performers">
内,而不是具有相同类名的div
标记。
Top Run Scorers 表位于<div id="r-0">
标记内。每个玩家都位于li
标记内。您可以在li
标记内获取玩家的所有详细信息。
我将向您展示如何获取 Top Run Scorers 的图片,名称和播放器详细信息。
r = requests.get('http://www.espncricinfo.com/series/_/id/8038/season/2018/icc-world-cup-qualifiers')
soup = BeautifulSoup(r.text, 'lxml')
top_run_scorers = []
for player in soup.find('div', id='r-0').find_all('li'):
image = player.find('img')['src']
info = player.find('div', class_='content-meta')
name = info.find('a').text
details = info.p.contents[-1]
top_run_scorers.append({'playerimage': image, 'playername': name, 'player-details': details})
print(top_run_scorers)
输出:
[{'player-details': ', PNG, Right-hand bat',
'playerimage': 'http://a.espncdn.com/combiner/i?img=/i/headshots/cricket/players/default-player-logo-500.png&h=55&w=40&scale=crop&transparent=true',
'playername': 'TP Ura'},
{'player-details': ', AFG, Right-hand bat',
'playerimage': 'http://a.espncdn.com/combiner/i?img=/i/headshots/cricket/players/25913.png&h=55&w=40&scale=crop&transparent=true',
'playername': 'Mohammad Nabi'},
{'player-details': ', WI, Left-hand bat',
'playerimage': 'http://a.espncdn.com/combiner/i?img=/i/headshots/cricket/players/default-player-logo-500.png&h=55&w=40&scale=crop&transparent=true',
'playername': 'SO Hetmyer'}]
答案 1 :(得分:1)
选择包含类名sub-module
和performers
的元素中的所有列表项,然后解析每个列表项中的玩家详细信息。 e.g。
r = requests.get("http://www.espncricinfo.com/series/_/id/8038/season/2018/icc-world-cup-qualifiers/"
)
soup = BeautifulSoup(r.text, "html.parser")
toprunners = soup.select(".sub-module.performers li")
def player(li):
name_and_details = li.select_one('p')
name = name_and_details.a
details = name.nextSibling
stats = li.select_one('.overall-stats p')
img = li.select_one('.focus-image')
return {
'player_name': name.text,
'player_details': details.strip(', '),
'player_image': img.attrs['src'],
'runs': name_and_details.nextSibling.text,
'innings': stats.span.text,
'average': stats.nextSibling.span.text,
}
players = [player(li) for li in toprunners]
In[2]: print(players)
[{'player_name': 'TP Ura', 'player_details': 'PNG, Right-hand bat', 'player_image': 'http://a.espncdn.com/combiner/i?img=/i/headshots/cricket/players/default-player-logo-500.png&h=55&w=40&scale=crop&transparent=true', 'runs': '188', 'innings': '2', 'average': '94.00'}, {'player_name': 'Mohammad Nabi', 'player_details': 'AFG, Right-hand bat', 'player_image': 'http://a.espncdn.com/combiner/i?img=/i/headshots/cricket/players/25913.png&h=55&w=40&scale=crop&transparent=true', 'runs': '181', 'innings': '3', 'average': '60.33'}, {'player_name': 'SO Hetmyer', 'player_details': 'WI, Left-hand bat', 'player_image': 'http://a.espncdn.com/combiner/i?img=/i/headshots/cricket/players/default-player-logo-500.png&h=55&w=40&scale=crop&transparent=true', 'runs': '171', 'innings': '3', 'average': '57.00'}, {'player_name': 'Ehsan Khan', 'player_details': 'HKG, Right-arm offbreak', 'player_image': 'http://a.espncdn.com/combiner/i?img=/i/headshots/cricket/players/default-player-logo-500.png&h=55&w=40&scale=crop&transparent=true', 'runs': '9', 'innings': '3', 'average': '12.55'}, {'player_name': 'Mujeeb Ur Rahman', 'player_details': 'AFG, Right-arm offbreak', 'player_image': 'http://a.espncdn.com/combiner/i?img=/i/headshots/cricket/players/default-player-logo-500.png&h=55&w=40&scale=crop&transparent=true', 'runs': '8', 'innings': '3', 'average': '15.25'}, {'player_name': 'JO Holder', 'player_details': 'WI, Right-arm medium-fast', 'player_image': 'http://a.espncdn.com/combiner/i?img=/i/headshots/cricket/players/391485.png&h=55&w=40&scale=crop&transparent=true', 'runs': '7', 'innings': '3', 'average': '21.28'}]