我正在尝试通过网页抓取网页并将数据存储在CSV文件中。但是我似乎无法使我的代码正常工作

时间:2019-07-01 17:23:29

标签: python web-scraping

我正在尝试通过此链接https://www.premierleague.com/stats/top/players/goals?se=-1来简化 而且我似乎没有成功。

下面是我尝试的代码。

from urllib.request import urlopen
from bs4 import BeautifulSoup
import requests
import csv`

url = "https://www.premierleague.com/stats/top/players/goals?se=-1"
html = urlopen(url)
bs = BeautifulSoup(html, 'html.parser')

#print(bs)

listings = []

for rows in bs.find_all("tr"):
 if("oddrow" in rows["class"]) or ("evenrow" in rows["class"]):
    name = rows.find("div", class_="playerName").a.get_text()
    country = rows.find_all("td")[1].get_text()
    goals = rows.find_all("td")[4].get_text()
    listings.append([name, country, goals])

with open("EPL_TEST.csv", 'a', encoding = 'utf-8') as toWrite:
    writer = csv.writer(toWrite)
    writer.writerows(listings)

print("Data Fetched")

这是我遇到的错误:C:\Users\Siddhardh\Desktop\Python\Projects\FinalProject\venv\Scripts\python.exe C:/Users/Siddhardh/Desktop/Python/Projects/FinalProject/Scraping.py Traceback (most recent call last): File "C:/Users/Siddhardh/Desktop/Python/Projects/FinalProject/Scraping.py", line 16, in <module>     if("oddrow" in rows["class"]) or ("evenrow" in rows["class"]): File "C:\Users\Siddhardh\Desktop\Python\Projects\FinalProject\venv\lib\site-packages\bs4\element.py", line 1016, in __getitem__ return self.attrs[key] KeyError: 'class'

Process finished with exit code 1

我需要将所有玩家的姓名,国家/地区和目标输入CSV文件。

P.S。请原谅我的编辑技巧。这是我在此的头一篇博文。我会学习的。

2 个答案:

答案 0 :(得分:0)

您似乎必须将代码的中间部分更改为:

listings = []

names = bs.find_all("td",scopr="row")
countries = bs.find_all("span",  {"class": "playerCountry"})
goals = bs.find_all("td",class_="mainStat")
for name, country, goal in zip(names,countries,goals):    
    listings.append([name.text.strip(), country.text.strip(), goal.text.strip()])

打印出listings会得到以下输出:

  

['Alan Shearer','England','260']   ['Wayne Rooney','England','208']   ['Andrew Cole','England','187']

答案 1 :(得分:0)

请尝试使用下面的脚本来获取遍历多个页面的所有名称以及一个存在数据的csv文件。您可以使用chrome dev工具获取我在脚本中使用的链接。使用该链接,您将获得json响应。进行修改以获取所有其他字段。

[{"01", "Invalid"}, {"01", "One more"}, {"02", "Invalid"}, {"03", "another test"}]

我使用[{"01", "Invalid"}, {"02", "Invalid"}, {"03", "another test"}] 从该页面开始获取所有名称。随时使用import csv import requests from bs4 import BeautifulSoup url = "https://footballapi.pulselive.com/football/stats/ranked/players/goals?page={}&pageSize=20&comps=1&compCodeForActivePlayer=EN_PR&altIds=true" headers = { 'Origin': 'https://www.premierleague.com', } def get_items(link,page): while True: res = requests.get(link.format(page),headers=headers) soup = BeautifulSoup(res.text,"lxml") if not len(res.json()['stats']['content']):break for item in res.json()['stats']['content']: player_name = item['owner']['name']['display'] yield player_name page+=1 if __name__ == '__main__': page = 112 with open("player_info.csv","w", newline="") as outfile: writer = csv.writer(outfile) writer.writerow(['page','player']) for name in get_items(url,page): writer.writerow([name]) 从头到尾获取名称。