尝试从以下html格式编制统计数据的提取...转换为像awayPlayers = [['Carmelo Anthony','30','19','5','3'],[' Kristaps Porzingis'....]]这样我就可以轻松地以我自己的格式显示它并处理数据。
我已经掌握了BeautifulSoup的基础知识,但就这个项目而言,我有点失落,因为我想要的统计数据都只是被td标签所包围..一些帮助很多人赞赏!!!
<div class="standings">
<h3 class="standings-title">NYK</h3>
<div class="awayTeam-boxscore">
<table>
<tbody>
<tr class="table-header">
<td>Name</td>
<td>MIN</td>
<td>PTS</td>
<td>REB</td>
<td>AST</td>
</tr>
<tr>
<td><a href="/feature/player/carmelo_anthony/index.html?locale=en_US">C.Anthony</a></td>
<td>30</td>
<td>19</td>
<td>5</td>
<td>3</td>
</tr>
<tr>
<td><a href="/feature/player/kristaps_porzingis/index.html?locale=en_US">K.Porzingis</a></td>
<td>33</td>
<td>16</td>
<td>7</td>
<td>0</td>
</tr>
<tr>
<td><a href="/feature/player/joakim_noah/index.html?locale=en_US">J.Noah</a></td>
<td>20</td>
<td>0</td>
<td>6</td>
<td>3</td>
</tr>
<tr>
<td><a href="/feature/player/courtney_lee/index.html?locale=en_US">C.Lee</a></td>
<td>20</td>
<td>0</td>
<td>3</td>
<td>0</td>
</tr>
<tr>
<td><a href="/feature/player/derrick_rose/index.html?locale=en_US">D.Rose</a></td>
<td>30</td>
<td>17</td>
<td>3</td>
<td>1</td>
</tr>
<tr>
<td><a href="/feature/player/brandon_jennings/index.html?locale=en_US">B.Jennings</a></td>
<td>21</td>
<td>7</td>
<td>3</td>
<td>5</td>
</tr>
<tr>
<td><a href="/feature/player/kyle_oquinn/index.html?locale=en_US">K.O'Quinn</a></td>
<td>15</td>
<td>2</td>
<td>5</td>
<td>1</td>
</tr>
<tr>
<td><a href="/feature/player/lance_thomas/index.html?locale=en_US">L.Thomas</a></td>
<td>17</td>
<td>2</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td><a href="/feature/player/justin_holiday/index.html?locale=en_US">J.Holiday</a></td>
<td>26</td>
<td>8</td>
<td>6</td>
<td>2</td>
</tr>
<tr>
<td><a href="/feature/player/willy_hernangomez/index.html?locale=en_US">W.Hernangomez</a></td>
<td>9</td>
<td>4</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td><a href="/feature/player/sasha_vujacic/index.html?locale=en_US">S.Vujacic</a></td>
<td>3</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td><a href="/feature/player/mindaugas_kuzminskas/index.html?locale=en_US">M.Kuzminskas</a></td>
<td>9</td>
<td>7</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td><a href="/feature/player/ron_baker/index.html?locale=en_US">R.Baker</a></td>
<td>7</td>
<td>5</td>
<td>1</td>
<td>0</td>
</tr>
</tbody>
</table>
</div>
<h3 class="standings-title">CLE</h3>
<div class="homeTeam-boxscore">
<table>
<tbody>
<tr class="table-header">
<td>Name</td>
<td>MIN</td>
<td>PTS</td>
<td>REB</td>
<td>AST</td>
</tr>
<tr>
<td><a href="/feature/player/lebron_james/index.html?locale=en_US">L.James</a></td>
<td>32</td>
<td>19</td>
<td>11</td>
<td>14</td>
</tr>
<tr>
<td><a href="/feature/player/kevin_love/index.html?locale=en_US">K.Love</a></td>
<td>25</td>
<td>23</td>
<td>12</td>
<td>2</td>
</tr>
<tr>
<td><a href="/feature/player/tristan_t_thompson/index.html?locale=en_US">T.Thompson</a></td>
<td>22</td>
<td>0</td>
<td>6</td>
<td>0</td>
</tr>
<tr>
<td><a href="/feature/player/jr_smith/index.html?locale=en_US">J.Smith</a></td>
<td>25</td>
<td>8</td>
<td>3</td>
<td>2</td>
</tr>
<tr>
<td><a href="/feature/player/kyrie_irving/index.html?locale=en_US">K.Irving</a></td>
<td>30</td>
<td>29</td>
<td>2</td>
<td>4</td>
</tr>
<tr>
<td><a href="/feature/player/richard_jefferson/index.html?locale=en_US">R.Jefferson</a></td>
<td>26</td>
<td>13</td>
<td>4</td>
<td>1</td>
</tr>
<tr>
<td><a href="/feature/player/iman_shumpert/index.html?locale=en_US">I.Shumpert</a></td>
<td>14</td>
<td>2</td>
<td>2</td>
<td>3</td>
</tr>
<tr>
<td><a href="/feature/player/mike_dunleavy/index.html?locale=en_US">M.Dunleavy</a></td>
<td>23</td>
<td>4</td>
<td>4</td>
<td>2</td>
</tr>
<tr>
<td><a href="/feature/player/channing_frye/index.html?locale=en_US">C.Frye</a></td>
<td>14</td>
<td>6</td>
<td>4</td>
<td>0</td>
</tr>
<tr>
<td><a href="/feature/player/jordan_mcrae/index.html?locale=en_US">J.McRae</a></td>
<td>6</td>
<td>2</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td><a href="/feature/player/deandre_liggins/index.html?locale=en_US">D.Liggins</a></td>
<td>12</td>
<td>4</td>
<td>3</td>
<td>3</td>
</tr>
<tr>
<td><a href="/feature/player/chris_andersen/index.html?locale=en_US">C.Andersen</a></td>
<td>6</td>
<td>2</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td><a href="/feature/player/james_jones/index.html?locale=en_US">J.Jones</a></td>
<td>6</td>
<td>5</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>
</div>
</div>
</section>
<footer>
<nav>
<div class="footer-nav">
<div class="access-key-navigation">
<div>
<span>0.</span>
<a accesskey="0" href="/feature/index.html?locale=en_US">Home</a>
</div>
<div>
<span>1.</span>
<a accesskey="1" href="/feature/about/index.html?locale=en_US">About</a>
</div>
<div class="selected">
<span>2.</span>
<a accesskey="2" href="/feature/scores/index.html?locale=en_US">Scores</a>
</div>
<div>
<span>3.</span>
<a accesskey="3" href="/feature/news/index.html?locale=en_US">News</a>
</div>
<div>
<span>4.</span>
<a accesskey="4" href="/feature/players/index.html?locale=en_US">Players</a>
</div>
<div>
<span>5.</span>
<a accesskey="5" href="/feature/season/leaders.html?locale=en_US">Leaders</a>
</div>
<div>
<span>6.</span>
<a accesskey="6" href="/feature/standings/index.html?locale=en_US">Standings</a>
</div>
<div>
<span>7.</span>
<a accesskey="7" href="/feature/teams/index.html?locale=en_US">Teams</a>
</div>
</div>
<div class="copyright">
© 2016 NBA Media Ventures, LLC. All rights reserved
</div>
</div>
</nav>
</footer>
</div>
答案 0 :(得分:-1)
每行代表一个玩家,因此您必须遍历<tr>
标记并提取内部数据。方法如下:
from bs4 import BeautifulSoup
# replace with the html
html_doc = """<div> ... </div>"""
soup = BeautifulSoup(html_doc, "html.parser")
# this is where we store the extracted data
players = []
# iterates through the table rows
for row in soup.find_all('tr'):
# this takes the text (which is seperated by \n in you case)
# and the "if data" is used to clean up empty entries
player_data = [data for data in row.get_text().split("\n") if data]
players.append(player_data)
# we remove the first entry, as it's the table headers
del players[0]
print(players)
输出:
[['C.Anthony', '30', '19', '5', '3'], ['K.Porzingis', '33', '16', '7', '0'], ['J.Noah', '20', '0', '6', '3'], ['C.Lee', '20', '0', '3', '0'], ['D.Rose', '30', '17', '3', '1'], ['B.Jennings', '21', '7', '3', '5'], ["K.O'Quinn", '15', '2', '5', '1'], ['L.Thomas', '17', '2', '1', '1'], ['J.Holiday', '26', '8', '6', '2'], ['W.Hernangomez', '9', '4', '1', '0'], ['S.Vujacic', '3', '1', '0', '1'], ['M.Kuzminskas', '9', '7', '1', '0'], ['R.Baker', '7', '5', '1', '0'], ['Name', 'MIN', 'PTS', 'REB', 'AST'], ['L.James', '32', '19', '11', '14'], ['K.Love', '25', '23', '12', '2'], ['T.Thompson', '22', '0', '6', '0'], ['J.Smith', '25', '8', '3', '2'], ['K.Irving', '30', '29', '2', '4'], ['R.Jefferson', '26', '13', '4', '1'], ['I.Shumpert', '14', '2', '2', '3'], ['M.Dunleavy', '23', '4', '4', '2'], ['C.Frye', '14', '6', '4', '0'], ['J.McRae', '6', '2', '0', '0'], ['D.Liggins', '12', '4', '3', '3'], ['C.Andersen', '6', '2', '0', '0'], ['J.Jones', '6', '5', '0', '0']]
如果您想获取全名,则必须从每位玩家周围href
的{{1}}中提取此名称。