我是网络爬虫的新手。
在此代码中我找不到我的错:
import requests
import csv
from bs4 import BeautifulSoup
url = "https://www.transfermarkt.co.uk/spieler-
statistik/wertvollstespieler/marktwertetop"
response=requests.get(url)
html_icerigi=response.content
soup=BeautifulSoup(html_icerigi,"html.parser")
footballer = soup.find_all("a",{"class":"spielprofil_tooltip tooltipstered"})
footballer_list=[]
for footballer in footballer_list:
footballer=footballer.text
footballer=footballer.strip()
footballer=footballer.replace("\n","")
footballer_list.append(["Futbolcu:{}".format(footballer)])
print(footballer_list)
答案 0 :(得分:2)
它可以与'
一起使用,这就是问题所在
它具有防刮擦功能,需要设置请求用户代理
DROP TABLE IF EXISTS Profiles;
CREATE TABLE IF NOT EXISTS Profiles (Username TEXT PRIMARY KEY,Password TEXT DEFAULT 'UNSETPASSWORD');
INSERT OR IGNORE INTO Profiles (Username) VALUES
('ht001'),('ht950'),('ht999');
SELECT * FROM Profiles;
UPDATE Profiles SET Password = '|a¡è~©jÃQZ!ëg! (ªBìSóûÌõ»vî' WHERE UserName = 'ht999';
SELECT * FROM Profiles;
类
您可以动态删除它。
使用BeautifulSoup
而不是转义字符串tooltipstered
。
您要迭代的是空列表,而不是response.text
元素的列表
response.content
不需要的多行变量重写,可能是错误的列表树,您的意思是想 追加字典而不是
a
固定代码:
footballer_list=[]
for footballer in footballer_list:
结果:
[['Futbolcu:Kylian Mbappé'], ......, ['Futbolcu:Marlon Freitas']]
答案 1 :(得分:1)
安装Selenium,然后以这种方式访问它。否则,您的代码似乎可以正常工作
import bs4
from selenium import webdriver
browser = webdriver.Chrome()
browser.get('https://www.transfermarkt.co.uk/spieler-statistik/wertvollstespieler/marktwertetop')
html_icerigi = browser.page_source
soup = bs4.BeautifulSoup(html_icerigi,"html.parser")
footballer = soup.find_all("a",{"class":"spielprofil_tooltip tooltipstered"})
footballer_list=[]
for footballer in footballer_list:
footballer=footballer.text
footballer=footballer.strip()
footballer=footballer.replace("\n","")
footballer_list.append(["Futbolcu:{}".format(footballer)])
print(footballer)
browser.close()
输出:
[<a class="spielprofil_tooltip tooltipstered" href="/kylian-mbappe/profil/spieler/342229" id="342229">Kylian Mbappé</a>, <a class="spielprofil_tooltip tooltipstered" href="/neymar/profil/spieler/68290" id="68290">Neymar</a>, <a class="spielprofil_tooltip tooltipstered" href="/lionel-messi/profil/spieler/28003" id="28003">Lionel Messi</a>, <a class="spielprofil_tooltip tooltipstered" href="/mohamed-salah/profil/spieler/148455" id="148455">Mohamed Salah</a>, <a...
答案 2 :(得分:1)
除selenium
外,您还可以使用requests_html
来呈现页面。尽管您在问为什么没有获得任何收益,但是您的for-loop
是错误的。这意味着即使您已经运行了JavaScript并获得了完整的html代码,您最终还是会得到空的footballer_list
。
import requests_html
from bs4 import BeautifulSoup
url = "https://www.transfermarkt.co.uk/spieler-statistik/wertvollstespieler/marktwertetop"
with requests_html.HTMLSession() as s:
resp = s.get(url)
resp.html.render()
page = resp.html.raw_html
soup = BeautifulSoup(page,"html.parser")
footballer_all = soup.find_all("a",{"class":"spielprofil_tooltip tooltipstered"})
footballer_list = []
for footballer in footballer_all:
footballer = footballer.text
footballer = footballer.strip()
footballer = footballer.replace("\n","")
footballer_list.append(["Futbolcu:{}".format(footballer)])
print(footballer_list)