我正在尝试添加游戏ID'列到我正在抓取的表中(参见下面的脚本)。我不知道在哪里广告pd.Dataframe以及要调用什么(在我的网页中),以便我可以插入一个名为'游戏ID'的新列。之前我将脚本编写到csv文件中(以便使用新的游戏id列进行写入)。
(只是一些背景信息:'游戏ID'是scrape从网址迭代的循环中的i)
我试着进入
但我不知道该怎么称呼我的数据帧(我试过pd.Dataframe [table,columns =' cols]但它不会读它)。
#ALL HOME GOALIES GAME STATS
import requests
from bs4 import BeautifulSoup
import pandas as pd
import csv
f = open('HOME_GOALIES_ALL.csv', 'a', newline = '')
writer = csv.writer(f)
GameID = i
for i in range (400961844,400961845):
url = requests.get("http://www.espn.com/nhl/boxscore?gameId={}".format(i))
if not url.ok:
continue
data = url.text
soup = BeautifulSoup(data, 'lxml')
table = soup.find_all('table', {'class' : 'mod-data'})[8].find_all('tr')[2:]
for row in table:
cols = row.findChildren(recursive=False)
cols = [ele.text.strip() for ele in cols]
writer.writerow(cols)
答案 0 :(得分:0)
您的代码中没有DataFrame
,但是,可以按照以下方式执行此操作:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import csv
table = []
df = pd.DataFrame()
for i in range (400961844,400961848):
url = requests.get("http://www.espn.com/nhl/boxscore?gameId={}".format(i))
if not url.ok:
continue
data = url.text
soup = BeautifulSoup(data, 'lxml')
#Add the game ID to the list of soups to keep track of multiple players with same game ID
table.append((i,soup.find_all('table', {'class' : 'mod-data'})[8].find_all('tr')[2:]))
data = []
soups = []
game_id = []
for i,t in table:
#Use .contents method to turn the soup into list of items
soups = [j.contents for j in t]
for s in soups:
#Use .string method to parse the values of different columns
data.append([a.string for a in s])
#Append the Game ID
game_id.append(i)
In [58]:
data
Out[58]:
[['H. Lundqvist', '25', '3', '22', '.880', '58:19', '0'],
['C. Anderson', '28', '4', '24', '.857', '65:00', '0'],
['J. Howard', '39', '2', '37', '.949', '60:00', '0'],
['C. Crawford', '29', '1', '28', '.966', '59:56', '0'],
['J. Gibson', '30', '4', '26', '.867', '59:53', '10'],
['J. Quick', '35', '0', '35', '1.000', '59:59', '0'],
['S. Bobrovsky', '29', '0', '29', '1.000', '59:53', '0'],
['A. Vasilevskiy', '36', '3', '33', '.917', '60:00', '0'],
['K. Lehtonen', '11', '2', '9', '.818', '15:00', '0'],
['B. Bishop', '19', '0', '19', '1.000', '43:58', '0'],
['F. Andersen', '35', '5', '30', '.857', '60:00', '0']]
#Create a DataFrame from the data extracted
df = pd.DataFrame(data)
In [59]:
df
Out[59]:
0 1 2 3 4 5 6
0 H. Lundqvist 25 3 22 .880 58:19 0
1 C. Anderson 28 4 24 .857 65:00 0
2 J. Howard 39 2 37 .949 60:00 0
3 C. Crawford 29 1 28 .966 59:56 0
4 J. Gibson 30 4 26 .867 59:53 10
5 J. Quick 35 0 35 1.000 59:59 0
6 S. Bobrovsky 29 0 29 1.000 59:53 0
7 A. Vasilevskiy 36 3 33 .917 60:00 0
8 K. Lehtonen 11 2 9 .818 15:00 0
9 B. Bishop 19 0 19 1.000 43:58 0
10 F. Andersen 35 5 30 .857 60:00 0
可以使用:df.columns = [list_of_columns_names]
现在重要的是,要添加“游戏ID”列,您可以使用我们之前创建的game_id
列表:df['Game ID'] = game_id
最后将DataFrame
写为CSV
文件:df.to_csv('path_of_file')