我在解析篮球参考资料时遇到了一些麻烦。我正在查看的网页(https://www.basketball-reference.com/contracts/IND.html)看起来非常肿,有大量的广告跟踪器和无关的菜单。我正在尝试提取名为“工资单”的数据表,该数据表具有以下html源代码(埋在一堆其他垃圾中-至少对我来说看起来像垃圾)。
<table class="suppress_glossary sortable stats_table" id="contracts" data-cols-to-freeze=1><caption>Payroll Table</caption>
<colgroup><col><col><col><col><col><col><col><col><col><col></colgroup>
<thead>
<tr class="over_header">
<th aria-label="" data-stat=" " colspan="2" class=" over_header center" > </th>
<th aria-label="" data-stat="header_salary" colspan="6" class=" over_header center" >Salary</th>
<th aria-label="" data-stat=" " colspan="2" class=" over_header center" > </th>
</tr>
<tr>
<th aria-label="Player" data-stat="player" scope="col" class=" poptip sort_default_asc center" >Player</th>
<th aria-label="Age" data-stat="age_today" scope="col" class=" poptip center" >Age</th>
<th aria-label="2019-20" data-stat="y1" scope="col" class=" poptip center" data-over-header="Salary" >2019-20</th>
<th aria-label="2020-21" data-stat="y2" scope="col" class=" poptip center" data-over-header="Salary" >2020-21</th>
<th aria-label="2021-22" data-stat="y3" scope="col" class=" poptip center" data-over-header="Salary" >2021-22</th>
<th aria-label="2022-23" data-stat="y4" scope="col" class=" poptip center" data-over-header="Salary" >2022-23</th>
<th aria-label="2023-24" data-stat="y5" scope="col" class=" poptip center" data-over-header="Salary" >2023-24</th>
<th aria-label="2024-25" data-stat="y6" scope="col" class=" poptip center" data-over-header="Salary" >2024-25</th>
<th aria-label="Signed Using" data-stat="signed_using" scope="col" class=" poptip sort_default_asc center" >Signed Using</th>
<th aria-label="The amount of a player's remaining salary that is guaranteed." data-stat="remain_gtd" scope="col" class=" poptip center" data-tip="The amount of a player's remaining salary that is guaranteed." >Guaranteed</th>
</tr>
</thead>
<tbody>
<tr ><th scope="row" class="left " data-append-csv="oladivi01" data-stat="player" csk="oladivi01" ><a href="/players/o/oladivi01.html">Victor Oladipo</a></th><td class="center " data-stat="age_today" >27</td><td class="right " data-stat="y1" csk="21000000" >$21,000,000</td><td class="right " data-stat="y2" csk="21000000" >$21,000,000</td><td class="right iz" data-stat="y3" ></td><td class="right iz" data-stat="y4" ></td><td class="right iz" data-stat="y5" ></td><td class="right iz" data-stat="y6" ></td><td class="left " data-stat="signed_using" >1st Round Pick</td><td class="right " data-stat="remain_gtd" csk="42000000" >$42,000,000</td></tr>
<tr ><th scope="row" class="left " data-append-csv="brogdma01" data-stat="player" csk="brogdma01" ><a href="/players/b/brogdma01.html">Malcolm Brogdon</a></th><td class="center " data-stat="age_today" >26</td><td class="right " data-stat="y1" csk="20000000" >$20,000,000</td><td class="right " data-stat="y2" csk="20700000" >$20,700,000</td><td class="right " data-stat="y3" csk="21700000" >$21,700,000</td><td class="right " data-stat="y4" csk="22600000" >$22,600,000</td><td class="right iz" data-stat="y5" ></td><td class="right iz" data-stat="y6" ></td><td class="left iz" data-stat="signed_using" ></td><td class="right " data-stat="remain_gtd" csk="85000000" >$85,000,000</td></tr>
<tr ><th scope="row" class="left " data-append-csv="turnemy01" data-stat="player" csk="turnemy01" ><a href="/players/t/turnemy01.html">Myles Turner</a></th><td class="center " data-stat="age_today" >23</td><td class="right " data-stat="y1" csk="18000000" >$18,000,000</td><td class="right " data-stat="y2" csk="18000000" >$18,000,000</td><td class="right " data-stat="y3" csk="18000000" >$18,000,000</td><td class="right " data-stat="y4" csk="18000000" >$18,000,000</td><td class="right iz" data-stat="y5" ></td><td class="right iz" data-stat="y6" ></td><td class="left " data-stat="signed_using" >1st round pick</td><td class="right " data-stat="remain_gtd" csk="72000000" >$72,000,000</td></tr>
<tr ><th scope="row" class="left " data-append-csv="warretj01" data-stat="player" csk="warretj01" ><a href="/players/w/warretj01.html">T.J. Warren</a></th><td class="center " data-stat="age_today" >26</td><td class="right " data-stat="y1" csk="10810000" >$10,810,000</td><td class="right " data-stat="y2" csk="11750000" >$11,750,000</td><td class="right " data-stat="y3" csk="12690000" >$12,690,000</td><td class="right iz" data-stat="y4" ></td><td class="right iz" data-stat="y5" ></td><td class="right iz" data-stat="y6" ></td><td class="left " data-stat="signed_using" >1st Round Pick</td><td class="right " data-stat="remain_gtd" csk="35250000" >$35,250,000</td></tr>
<tr ><th scope="row" class="left " data-append-csv="lambje01" data-stat="player" csk="lambje01" ><a href="/players/l/lambje01.html">Jeremy Lamb</a></th><td class="center " data-stat="age_today" >27</td><td class="right " data-stat="y1" csk="10500000" >$10,500,000</td><td class="right " data-stat="y2" csk="10500000" >$10,500,000</td><td class="right " data-stat="y3" csk="10500000" >$10,500,000</td><td class="right iz" data-stat="y4" ></td><td class="right iz" data-stat="y5" ></td><td class="right iz" data-stat="y6" ></td><td class="left iz" data-stat="signed_using" ></td><td class="right " data-stat="remain_gtd" csk="31500000" >$31,500,000</td></tr>
<tr ><th scope="row" class="left " data-append-csv="mcderdo01" data-stat="player" csk="mcderdo01" ><a href="/players/m/mcderdo01.html">Doug McDermott</a></th><td class="center " data-stat="age_today" >27</td><td class="right " data-stat="y1" csk="7333334" >$7,333,334</td><td class="right " data-stat="y2" csk="7333333" >$7,333,333</td><td class="right iz" data-stat="y3" ></td><td class="right iz" data-stat="y4" ></td><td class="right iz" data-stat="y5" ></td><td class="right iz" data-stat="y6" ></td><td class="left iz" data-stat="signed_using" ></td><td class="right " data-stat="remain_gtd" csk="14666667" >$14,666,667</td></tr>
<tr ><th scope="row" class="left " data-append-csv="holidju01" data-stat="player" csk="holidju01" ><a href="/players/h/holidju01.html">Justin Holiday</a></th><td class="center " data-stat="age_today" >30</td><td class="right " data-stat="y1" csk="4767000" >$4,767,000</td><td class="right iz" data-stat="y2" ></td><td class="right iz" data-stat="y3" ></td><td class="right iz" data-stat="y4" ></td><td class="right iz" data-stat="y5" ></td><td class="right iz" data-stat="y6" ></td><td class="left " data-stat="signed_using" >Room Exception</td><td class="right " data-stat="remain_gtd" csk="4767000" >$4,767,000</td></tr>
<tr ><th scope="row" class="left " data-append-csv="sabondo01" data-stat="player" csk="sabondo01" ><a href="/players/s/sabondo01.html">Domantas Sabonis</a></th><td class="center " data-stat="age_today" >23</td><td class="right " data-stat="y1" csk="3529555" >$3,529,555</td><td class="right iz" data-stat="y2" ></td><td class="right iz" data-stat="y3" ></td><td class="right iz" data-stat="y4" ></td><td class="right iz" data-stat="y5" ></td><td class="right iz" data-stat="y6" ></td><td class="left " data-stat="signed_using" >1st Round pick</td><td class="right " data-stat="remain_gtd" csk="3529555" >$3,529,555</td></tr>
<tr ><th scope="row" class="left " data-append-csv="mccontj01" data-stat="player" csk="mccontj01" ><a href="/players/m/mccontj01.html">T.J. McConnell</a></th><td class="center " data-stat="age_today" >27</td><td class="right " data-stat="y1" csk="3500000" >$3,500,000</td><td class="right " data-stat="y2" csk="3500000" ><em>$3,500,000</em></td><td class="right iz" data-stat="y3" ></td><td class="right iz" data-stat="y4" ></td><td class="right iz" data-stat="y5" ></td><td class="right iz" data-stat="y6" ></td><td class="left " data-stat="signed_using" >Cap Space</td><td class="right " data-stat="remain_gtd" csk="4500000" >$4,500,000</td></tr>
<tr ><th scope="row" class="left " data-append-csv="bitadgo01" data-stat="player" csk="bitadgo01" ><a href="/players/b/bitadgo01.html">Goga Bitadze</a></th><td class="center " data-stat="age_today" >20</td><td class="right " data-stat="y1" csk="2816760" >$2,816,760</td><td class="right " data-stat="y2" csk="2957520" >$2,957,520</td><td class="right salary-tm" data-stat="y3" csk="3098400" >$3,098,400</td><td class="right salary-tm" data-stat="y4" csk="4765339" >$4,765,339</td><td class="right iz" data-stat="y5" ></td><td class="right iz" data-stat="y6" ></td><td class="left " data-stat="signed_using" >1st Round Pick</td><td class="right " data-stat="remain_gtd" csk="5774280" >$5,774,280</td></tr>
<tr ><th scope="row" class="left " data-append-csv="leaftj01" data-stat="player" csk="leaftj01" ><a href="/players/l/leaftj01.html">T.J. Leaf</a></th><td class="center " data-stat="age_today" >22</td><td class="right " data-stat="y1" csk="2813280" >$2,813,280</td><td class="right salary-tm" data-stat="y2" csk="4326825" >$4,326,825</td><td class="right iz" data-stat="y3" ></td><td class="right iz" data-stat="y4" ></td><td class="right iz" data-stat="y5" ></td><td class="right iz" data-stat="y6" ></td><td class="left " data-stat="signed_using" >1st Round Pick</td><td class="right " data-stat="remain_gtd" csk="2813280" >$2,813,280</td></tr>
<tr ><th scope="row" class="left " data-append-csv="holidaa01" data-stat="player" csk="holidaa01" ><a href="/players/h/holidaa01.html">Aaron Holiday</a></th><td class="center " data-stat="age_today" >23</td><td class="right " data-stat="y1" csk="2239200" >$2,239,200</td><td class="right salary-tm" data-stat="y2" csk="2345640" >$2,345,640</td><td class="right salary-tm" data-stat="y3" csk="3980551" >$3,980,551</td><td class="right iz" data-stat="y4" ></td><td class="right iz" data-stat="y5" ></td><td class="right iz" data-stat="y6" ></td><td class="left " data-stat="signed_using" >1st Round Pick</td><td class="right " data-stat="remain_gtd" csk="2239200" >$2,239,200</td></tr>
<tr ><th scope="row" class="left " data-append-csv="sumneed01" data-stat="player" csk="sumneed01" ><a href="/players/s/sumneed01.html">Edmond Sumner</a></th><td class="center " data-stat="age_today" >23</td><td class="right " data-stat="y1" csk="2000000" >$2,000,000</td><td class="right " data-stat="y2" csk="2160000" >$2,160,000</td><td class="right salary-tm" data-stat="y3" csk="2320000" >$2,320,000</td><td class="right iz" data-stat="y4" ></td><td class="right iz" data-stat="y5" ></td><td class="right iz" data-stat="y6" ></td><td class="left iz" data-stat="signed_using" ></td><td class="right " data-stat="remain_gtd" csk="4160000" >$4,160,000</td></tr>
<tr ><th scope="row" class="left " data-append-csv="sampsja02" data-stat="player" csk="sampsja02" ><a href="/players/s/sampsja02.html">JaKarr Sampson</a></th><td class="center " data-stat="age_today" >26</td><td class="right " data-stat="y1" csk="1737145" >$1,737,145</td><td class="right iz" data-stat="y2" ></td><td class="right iz" data-stat="y3" ></td><td class="right iz" data-stat="y4" ></td><td class="right iz" data-stat="y5" ></td><td class="right iz" data-stat="y6" ></td><td class="left " data-stat="signed_using" >Minimum Salary</td><td class="right " data-stat="remain_gtd" csk="1737145" >$1,737,145</td></tr>
<tr ><th scope="row" class="left " data-append-csv="johnsal02" data-stat="player" csk="johnsal02" ><a href="/players/j/johnsal02.html">Alize Johnson</a></th><td class="center " data-stat="age_today" >23</td><td class="right " data-stat="y1" csk="1416852" >$1,416,852</td><td class="right iz" data-stat="y2" ></td><td class="right iz" data-stat="y3" ></td><td class="right iz" data-stat="y4" ></td><td class="right iz" data-stat="y5" ></td><td class="right iz" data-stat="y6" ></td><td class="left " data-stat="signed_using" >Minimum Salary</td><td class="right " data-stat="remain_gtd" csk="1416852" >$1,416,852</td></tr>
<tr ><th scope="row" class="left " data-append-csv="mitrona01" data-stat="player" csk="mitrona01" ><a href="/players/m/mitrona01.html">Naz Mitrou-Long</a></th><td class="center " data-stat="age_today" >26</td><td class="right " data-stat="y1" > </td><td class="right iz" data-stat="y2" ></td><td class="right iz" data-stat="y3" ></td><td class="right iz" data-stat="y4" ></td><td class="right iz" data-stat="y5" ></td><td class="right iz" data-stat="y6" ></td><td class="left " data-stat="signed_using" >Two-Way Contract</td><td class="right " data-stat="remain_gtd" > </td></tr>
<tr ><th scope="row" class="left " data-append-csv="wilcocj01" data-stat="player" csk="wilcocj01" ><a href="/players/w/wilcocj01.html">C.J. Wilcox</a></th><td class="center " data-stat="age_today" >28</td><td class="right iz" data-stat="y1" ></td><td class="right iz" data-stat="y2" ></td><td class="right iz" data-stat="y3" ></td><td class="right iz" data-stat="y4" ></td><td class="right iz" data-stat="y5" ></td><td class="right iz" data-stat="y6" ></td><td class="left " data-stat="signed_using" >Minimum Salary</td><td class="right iz" data-stat="remain_gtd" ></td></tr>
<tr ><th scope="row" class="left " data-append-csv="brimaam01" data-stat="player" csk="brimaam01" ><a href="/players/b/brimaam01.html">Amida Brimah</a></th><td class="center " data-stat="age_today" >25</td><td class="right iz" data-stat="y1" ></td><td class="right iz" data-stat="y2" ></td><td class="right iz" data-stat="y3" ></td><td class="right iz" data-stat="y4" ></td><td class="right iz" data-stat="y5" ></td><td class="right iz" data-stat="y6" ></td><td class="left " data-stat="signed_using" >Minimum Salary</td><td class="right iz" data-stat="remain_gtd" ></td></tr>
<tr ><th scope="row" class="left " data-append-csv="gantja01" data-stat="player" csk="gantja01" ><a href="/players/g/gantja01.html">Jakeenan Gant</a></th><td class="center " data-stat="age_today" >23</td><td class="right iz" data-stat="y1" ></td><td class="right iz" data-stat="y2" ></td><td class="right iz" data-stat="y3" ></td><td class="right iz" data-stat="y4" ></td><td class="right iz" data-stat="y5" ></td><td class="right iz" data-stat="y6" ></td><td class="left " data-stat="signed_using" >Minimum Salary</td><td class="right iz" data-stat="remain_gtd" ></td></tr>
<tr ><th scope="row" class="left " data-append-csv="bowenbr02" data-stat="player" csk="bowenbr02" ><a href="/players/b/bowenbr02.html">Brian Bowen</a></th><td class="center " data-stat="age_today" >21</td><td class="right " data-stat="y1" > </td><td class="right iz" data-stat="y2" ></td><td class="right iz" data-stat="y3" ></td><td class="right iz" data-stat="y4" ></td><td class="right iz" data-stat="y5" ></td><td class="right iz" data-stat="y6" ></td><td class="left " data-stat="signed_using" >Two-Way Contract</td><td class="right " data-stat="remain_gtd" > </td></tr>
<tr class='thead'><td colspan='10'></td></tr>
<tr class="partial_table" ><th scope="row" class="left " data-append-csv="ellismo01" data-stat="player" csk="ellismo01" ><a href="/players/e/ellismo01.html"><em>Monta Ellis</em></a></th><td class="center " data-stat="age_today" >33</td><td class="right " data-stat="y1" csk="2245400" >$2,245,400</td><td class="right " data-stat="y2" csk="2245400" >$2,245,400</td><td class="right " data-stat="y3" csk="2245400" >$2,245,400</td><td class="right iz" data-stat="y4" ></td><td class="right iz" data-stat="y5" ></td><td class="right iz" data-stat="y6" ></td><td class="left iz" data-stat="signed_using" ></td><td class="right " data-stat="remain_gtd" csk="6736200" >$6,736,200</td></tr>
</tbody>
<tfoot><tr ><th scope="row" class="left " data-stat="player" >Team Totals</th><td class="center iz" data-stat="age_today" ></td><td class="right " data-stat="y1" >$114,708,526</td><td class="right " data-stat="y2" >$106,818,718</td><td class="right " data-stat="y3" >$74,534,351</td><td class="right " data-stat="y4" >$45,365,339</td><td class="right iz" data-stat="y5" ></td><td class="right iz" data-stat="y6" ></td><td class="left iz" data-stat="signed_using" ></td><td class="right " data-stat="remain_gtd" >$318,090,179</td></tr>
</tfoot>
</table>
当我运行以下python代码时,变量l为null。
#import beautiful soup, requests, time, pandas
from bs4 import BeautifulSoup
import requests
#assign the URL for contract scraping
url = 'https://www.basketball-reference.com/teams/IND.html'
#pull html from page
page = requests.get(url)
#format html using BS
soup = BeautifulSoup(page.text, "html.parser")
#take only table rows
l = soup.find_all('a',{'class':'left'})
print(l)
我想知道我是否没有正确的上课论据。还是还有其他原因print(l)返回[]?
答案 0 :(得分:1)
您追求的左类与锚标记没有关联,这就是为什么您获得零记录的原因。请尝试以下代码。
from bs4 import BeautifulSoup
import requests
r=requests.get("https://www.basketball-reference.com/contracts/IND.html")
soup=BeautifulSoup(r.text,'html.parser')
l=soup.select('.left > a')
print(l)
如果要获取播放器的名称。
from bs4 import BeautifulSoup
import requests
r=requests.get("https://www.basketball-reference.com/contracts/IND.html")
soup=BeautifulSoup(r.text,'html.parser')
l=[item.text for item in soup.select('.left > a')]
print(l)
输出:
['Victor Oladipo', 'Malcolm Brogdon', 'Myles Turner', 'T.J. Warren', 'Jeremy Lamb', 'Doug McDermott', 'Justin Holiday', 'Domantas Sabonis', 'T.J. McConnell', 'Goga Bitadze', 'T.J. Leaf', 'Aaron Holiday', 'Edmond Sumner', 'JaKarr Sampson', 'Alize Johnson', 'Brian Bowen', 'Naz Mitrou-Long', 'C.J. Wilcox', 'Amida Brimah', 'Jakeenan Gant', 'Monta Ellis']
答案 1 :(得分:1)
您说您想要工资表。您可以为此使用熊猫read_html
import pandas as pd
table = pd.read_html('https://www.basketball-reference.com/contracts/IND.html')[0]
print(table)