我正在努力使用beautifulsoup抓取espn nhl统计数据,试图创建像
这样的东西球员,球队,GP,G,A,PTS,+ / - ,PIM,PTS / G,SOG,PCT,GWG,G,A,G,A,
Patrick Kane,RW,CHI,82,46,60,106,17,30,1.29,287,16.0,9,17,20,0,0
Jamie Benn,LW,DAL,82,41,48,89,7,64,1.09,247,16.6,5,17,13 2 3
Sidney Crosby,C,PIT,80,36,49,85,19,42,1.06,248,14.5,9,10,14,0,0
到目前为止,我已经获得了一些循环并提取所有数据的内容,但它只是一列而没有逗号和标题
import urllib2
from bs4 import BeautifulSoup
url = "http://www.espn.com/nhl/statistics/player/_/stat/points"
page = urllib2.urlopen(url)
f = open('nhlstarter.txt', 'w')
soup=BeautifulSoup(page, "html.parser")
for tr in soup.select("#my-players-table tr[class*=player]"):
for ob in range(1,15):
player_info = tr('td')[ob].get_text(strip=True)
print(player_info)
f.write(player_info + '\n')
f.close()
这是
Patrick Kane, RW
CHI
82
46
60
106
17
30
1.29
287
16.0
9
17
20
等
如何将列数据转换为可用行?我想我可能会做以下事情:
for tr in soup.select("#my-players-table tr[class*=player]"):
for ob in range(1,15):
player_info + str(ob) = tr('td')[ob].get_text(strip=True)
print(player_info + str(ob))
f.write(player_info + str(ob) "," + player_info + str(ob) '\n')
但由于没有通过循环正确地增加变量
,因此失败了关于如何一次获取表的所有列或循环获取可用的csv的任何建议将不胜感激。
感谢您的帮助
答案 0 :(得分:0)
您可以将播放器信息最初附加到列表中以表示该行,然后在将列表写入文件时将其加入字符串:
for tr in soup.select("#my-players-table tr[class*=player]"):
row = []
for ob in range(1,15):
## -- Assuming player_info has the column data
player_info = tr('td')[ob].get_text(strip=True)
row.append(player_info)
f.write(",".join(row) + "\n")