我正试图通过使用BeautifulSoup的玩家来获取篮球参考的所有数据。我们以迈克尔乔丹为例:https://www.basketball-reference.com/players/j/jordami01.html。问题是,当我抓住html页面并通过html解析时,我只能抓取一个数据表而其他人似乎被注释掉了。我是python的新手,并希望有人可以告诉我为什么html似乎有某些数据表作为评论。有人可以帮我解决一下吗?
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
import pandas as pd
MJ_url = 'https://www.basketball-reference.com/players/j/jordami01.html'
uClient = uReq(MJ_url)
MJ_html = uClient.read()
uClient.close()
MJ_soup = soup(MJ_html, "html.parser")
MJ_containers = MJ_soup.findAll("table",{"class":"row_summable sortable
stats_table"})
答案 0 :(得分:1)
试试这个。评论中的所有数据现在都已经过去了:
import requests
from bs4 import BeautifulSoup, Comment
res = requests.get("https://www.basketball-reference.com/players/j/jordami01.html",headers={"User-Agent":"Mozilla/5.0"})
soup = BeautifulSoup(res.text, 'lxml')
for comment in soup.find_all(string=lambda text:isinstance(text,Comment)):
data = BeautifulSoup(comment,"lxml")
for items in data.select("table.row_summable tr"):
tds = [item.get_text(strip=True) for item in items.select("th,td")]
print(tds)