我是使用BeautifulSoup的新手,我尝试使用它从NHL.com获取一些测试数据。 到目前为止,这是我的代码,但我很遗憾...
以下是我想从中提取数据的HTML代码片段:
<tr>
<td rowspan="1" colspan="1"> … </td>
<td style="text-align: left;" rowspan="1" colspan="1">
<a href="/ice/player.htm?id=8474564">
Steven Stamkos
</a>
</td>
<td style="text-align: center;" rowspan="1" colspan="1">
<a href="javascript:void(0);" rel="TBL" onclick="loadTeamSpotlight(jQuery(this));" style="border-bottom:1px dotted;">
TBL
</a>
</td>
<td style="text-align: center;" rowspan="1" colspan="1">
C
</td>
<td style="center" rowspan="1" colspan="1">
16
</td>
<td style="center" rowspan="1" colspan="1">
14
</td>
<td style="center" rowspan="1" colspan="1">
9
</td>
我想从这些字段中提取整个页面的数据,因此大约有30个不同的表行。到目前为止,这是我的Python代码,我不确定该去哪里。
from bs4 import BeautifulSoup
import requests
r = requests.get("http://www.nhl.com/ice/playerstats.htm?fetchKey=20142ALLSASAll&viewName=summary&sort=points&pg=1")
data = r.text
t_data=[]
soup = BeautifulSoup(data)
table = soup.find('table', {'class': 'data stats'})
我知道这并不多,但我不知道如何解决这个问题。 感谢大家的帮助
编辑:我解决了这个问题,希望这将有助于将来的任何人。这是我的代码:from bs4 import BeautifulSoup
import requests
r = requests.get("http://www.nhl.com/ice/playerstats.htm?fetchKey=20142ALLSASAll&viewName=summary&sort=points&pg=1")
player=[]
team=[]
goals=[]
assists=[]
cells=[]
points=[]
i=0
data = r.text
soup = BeautifulSoup(data)
table = soup.find('table', {'class': 'data stats'})
row=[]
for rows in table.find_all('tr'):
cells=rows.find_all('td')
if(len(cells)==19):
player.append(cells[1].find(text=True))
team.append(cells[2].find(text=True))
goals.append(cells[5].find(text=True))
assists.append(cells[6].find(text=True))
points.append(cells[7].find(text=True))
print(player[i],team[i],goals[i],assists[i],points[i])
i=i+1
答案 0 :(得分:1)
我只想发布其他方法,因此您不必使用6个不同的列表来存储连接的数据。此外,还有一种更短,更优雅的方式来获取所有预期的行。
# getting data
#...
from bs4 import BeautifulSoup
from collections import namedtuple
soup = BeautifulSoup(data)
# thats where the data are collected
rows = list()
# named tuple to store the relevant data of one player
Player = namedtuple('Player', ['name', 'team', 'goals', 'assists', 'points'])
# getting every row of the tbody in the specified table
for tr in soup.select('table.data.stats tbody tr'):
# put text-contents of the row in a list
cellStrings = [cell.find(text = True) for cell in tr.findAll('td')]
# add it to the
rows.append(
Player(
name=cellStrings[1],
team=cellStrings[2],
goals=cellStrings[5],
assists=cellStrings[6],
points=cellStrings[7]
)
)
rows
看起来像
[Player(name=u'Steven Stamkos', team=u'TBL', goals=u'14', assists=u'9', points=u'23'),
Player(name=u'Sidney Crosby', team=u'PIT', goals=u'8', assists=u'15', points=u'23'),
Player(name=u'Ryan Getzlaf', team=u'ANA', goals=u'10', assists=u'12', points=u'22'),
Player(name=u'Alexander Steen', team=u'STL', goals=u'14', assists=u'7', points=u'21'),
Player(name=u'Corey Perry', team=u'ANA', goals=u'11', assists=u'10', points=u'21'),
Player(name=u'Alex Ovechkin', team=u'WSH', goals=u'13', assists=u'7', points=u'20'),
....
像那样访问
>>> rows[20].name
u'Bryan Little'
答案 1 :(得分:0)
您还没有准确提到您需要的数据,但您可以继续这些方面:
from BeautifulSoup import BeautifulSoup
...
table = soup.find('table', {'class': 'data stats'})
rows = table.find('tr')
for row in rows:
cols = row.findAll('td')
for col in cols:
print col.text
link = col.find("a")
if link:
print link.get("href"), link.get("rel"), link.get("onclick"), link.text