我需要从已更新的网站解析此数字。我无法访问html代码的数据组件部分。
我已经尝试过xpath解析和bs4
url = "https://www.muthead.com/20/players/10111309/upgrades/"
r = requests.get(url)
content = r.text
soup = BeautifulSoup(content,"lxml")
hello = soup.find("div",class_="average rating-list__RatingValue-ubw14i-3 jzOWLB").text
print(hello)
我收到以下错误:
hello = soup.find("div",class_="average rating-list__RatingValue-ubw14i-3 jzOWLB").text
builtins.AttributeError: 'NoneType' object has no attribute 'text'
我需要在dic 77
内的html代码中抓取class= average rating-list__RatingValue-ubw14i-3 jzOWLB> 77 </div>
似乎bs4不能在主容器内查看
<div data-component="player-upgrades" data-props="{externalId": 10111309, "gameSlug": "20", "basePath": "/20/players/10111309/upgrades/"}
答案 0 :(得分:0)
网站上的内容是使用javascript呈现的,因此在您的情况下使用BeautifulSoup
几乎没有用。话虽如此,我建议您使用以下代码直接从API获取所有玩家统计信息:
import requests
url = "https://www.muthead.com/api/mutdb/player_item/?expand=game%2Cposition%2Cprogram%2Cteam%2Cupgrade_tiers&external_id=10111309&game__slug=20"
headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:68.0) Gecko/20100101 Firefox/68.0"}
req_raw = requests.get(url, headers=headers).json()
spd = req_raw["results"][0]["stats"][1]["value"]
print(spd)
或者,如果您打算使用许多链接,则可能需要使用以下代码:
import requests
url_raw = "https://www.muthead.com/20/players/10111309/upgrades/"
external_id = url_raw.split("/")[5]
game_slug = url_raw.split("/")[3]
url = "https://www.muthead.com/api/mutdb/player_item/?expand=game%2Cposition%2Cprogram%2Cteam%2Cupgrade_tiers&external_id={}&game__slug={}".format(external_id, game_slug)
headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:68.0) Gecko/20100101 Firefox/68.0"}
req_raw = requests.get(url, headers=headers).json()
spd = req_raw["results"][0]["stats"][1]["value"]
print(spd)
您只需要用要剪贴的游戏的URL替换url_raw
,脚本就会自动向API发送请求(无需查找API链接)
(请注意两个代码输出77
)
希望这会有所帮助!