由于某种原因,我无法在此page
上检索到我想要的表该表是标题为“线分数”的表,下面是该表的HTML:
<table class="suppress_all sortable stats_table now_sortable" id="line_score" data-cols-to-freeze="1"><thead><tr>
<th> </th>
<th colspan="5">Scoring</th>
</tr></thead><caption>Line Score Table</caption><tbody>
<tr class="thead" data-row="0">
<th> </th>
<th>1</th>
<th>2</th>
<th>3</th>
<th>4</th>
<th>T</th>
</tr>
<tr data-row="1">
<td><a href="/teams/LAL/2020.html">LAL</a></td>
<td class="center">25</td>
<td class="center">29</td>
<td class="center">31</td>
<td class="center">17</td>
<td class="center"><strong>102</strong></td>
</tr>
<tr data-row="2">
<td><a href="/teams/LAC/2020.html">LAC</a></td>
<td class="center">22</td>
<td class="center">40</td>
<td class="center">23</td>
<td class="center">27</td>
<td class="center"><strong>112</strong></td>
</tr>
</tbody></table>
我的代码如下:
import requests as r
import bs4 as bs
link = "https://basketball-reference.com/boxscores/201910220LAC.html"
resp = r.get(link)
soup = bs.BeautifulSoup(resp.content, 'lxml')
table = soup.find('table', {'class':'suppress_all sortable stats_table now_sortable'})
print table
我想稍后在其他地方使用<tr data-row="1"> and <tr data-row="2">
中的数据。
我敢肯定这很容易,但是我实在无法忍受,非常感谢您的帮助。
谢谢, 路易斯
答案 0 :(得分:0)
from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.firefox.options import Options
import time
import pandas as pd
options = Options()
options.add_argument('--headless')
driver = webdriver.Firefox(options=options)
driver.get("https://www.basketball-reference.com/boxscores/201910220LAC.html")
time.sleep(2)
soup = BeautifulSoup(driver.page_source, 'html.parser')
df = pd.read_html(driver.page_source)[18:19]
print(df)
输出:
[ Unnamed: 0 Scoring Scoring.1 Scoring.2 Scoring.3 Scoring.4
0 NaN 1 2 3 4 T
1 LAL 25 29 31 17 102
2 LAC 22 40 23 27 112]