使用BeautifulSoup的网页抓取表

时间:2019-12-27 04:57:10

标签: python web-scraping beautifulsoup

由于某种原因,我无法在此page

上检索到我想要的表

该表是标题为“线分数”的表,下面是该表的HTML:

<table class="suppress_all sortable stats_table now_sortable" id="line_score" data-cols-to-freeze="1"><thead><tr>
<th>&nbsp;</th>
<th colspan="5">Scoring</th>
</tr></thead><caption>Line Score Table</caption><tbody>
<tr class="thead" data-row="0">
<th>&nbsp;</th>
<th>1</th>
<th>2</th>
<th>3</th>
<th>4</th>
<th>T</th>
</tr>
<tr data-row="1">
<td><a href="/teams/LAL/2020.html">LAL</a></td>
<td class="center">25</td>
<td class="center">29</td>
<td class="center">31</td>
<td class="center">17</td>
<td class="center"><strong>102</strong></td>
</tr>
<tr data-row="2">
<td><a href="/teams/LAC/2020.html">LAC</a></td>
<td class="center">22</td>
<td class="center">40</td>
<td class="center">23</td>
<td class="center">27</td>
<td class="center"><strong>112</strong></td>
</tr>

</tbody></table>

我的代码如下:

import requests as r
import bs4 as bs

link = "https://basketball-reference.com/boxscores/201910220LAC.html"
resp = r.get(link)

soup = bs.BeautifulSoup(resp.content, 'lxml')

table = soup.find('table', {'class':'suppress_all sortable stats_table now_sortable'})

print table

我想稍后在其他地方使用<tr data-row="1"> and <tr data-row="2">中的数据。

我敢肯定这很容易,但是我实在无法忍受,非常感谢您的帮助。

谢谢, 路易斯

1 个答案:

答案 0 :(得分:0)

from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.firefox.options import Options
import time
import pandas as pd

options = Options()
options.add_argument('--headless')

driver = webdriver.Firefox(options=options)
driver.get("https://www.basketball-reference.com/boxscores/201910220LAC.html")
time.sleep(2)
soup = BeautifulSoup(driver.page_source, 'html.parser')

df = pd.read_html(driver.page_source)[18:19]

print(df)

输出:

[  Unnamed: 0  Scoring  Scoring.1  Scoring.2  Scoring.3 Scoring.4
0        NaN        1          2          3          4         T
1        LAL       25         29         31         17       102
2        LAC       22         40         23         27       112]