我是python的新手,正在尝试从网站上抓取数据。下面的代码返回我正在寻找的信息,但是当尝试将数据导出到CSV时,它只包含DataFrame的最后一行。我认为这是因为在写入字典时,它会替换最后一行数据,但我不确定如何避免这种情况。
from bs4 import BeautifulSoup
import requests
import pandas as pd
url = "http://elite.wttstats.pointstreak.com/boxscore.html?gameid=3149470"
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data)
table = soup.find_all('table', {'class' : 'table nova-stats-table table-
striped table-hover'})[0]
tbody = table.find('tbody')
rows = tbody.find_all('tr')[1:]
hockey = []
data = {
'#' : [],
'name' : [],
'G' : [],
'A' : [],
'PTS' : [],
'S' : [],
'PIM' : []
}
for row in rows:
cells = row.findAll('td')
data['#'] = cells[0].find(text=True)
data['name'] = cells[1].find('a').find(text=True).strip("\t")
data['G'] = cells[2].find(text=True)
data['A'] = cells[3].find(text=True)
data['PTS'] = cells[4].find(text=True)
data['S'] = cells[5].find(text=True)
data['PIM'] = cells[6].find(text=True)
hockey = pd.DataFrame([data])
print(hockey)
hockey.to_csv("Hockey Data.csv", columns = ["#", "name", "G", "A",
"PTS", "S", "PIM"])
以下是从print(hockey)
返回的曲棍球DataFrame示例。
# A G PIM PTS S name
0 3 0 0 7 0 0 Stadel, Riley
# A G PIM PTS S name
0 4 0 0 0 0 0 Harding, Adam
# A G PIM PTS S name
0 10 0 0 0 0 0 Brickler, Tyler
# A G PIM PTS S name
0 11 0 0 2 0 0 Inglis, Kris
# A G PIM PTS S name
0 12 1 0 0 1 0 Lévesque, Gabriel
# A G PIM PTS S name
0 14 0 0 0 0 0 Cownie, Jordan
# A G PIM PTS S name
0 17 1 0 0 1 0 Mimar, Marc-Olivier
# A G PIM PTS S name
0 19 0 1 0 1 0 Jensen, Jimmy
# A G PIM PTS S name
0 23 2 0 2 2 0 Andersson, Johan