我正在尝试通过此URL的请求下载此图像,但是idk在第17行出现了某些错误,未定义问题所在。
我尝试添加带有网址的http://,以使其成为一个清晰的网址。
这是我编写的代码。
from selenium import webdriver
from bs4 import BeautifulSoup
import requests
import os
driver = webdriver.Chrome(executable_path= r'E:/Summer/FirstThings/Web scraping (bucky + pdf)/webscraping/tutorials-master/chromedriver.exe')
url = 'https://www.nba.com/players/jaylen/adams/1629121'
driver.get(url)
#print(driver.page_source)
soup = BeautifulSoup(driver.page_source , 'lxml')
div = soup.find('section' , class_='nba-player-header__item nba-player-header__headshot')
img = div.find('img')
print("")
m=('http://'+ img['src'])
f = open('jaylen_adams.jpg','w')
f.write(requests.get(m).content)
f.close()
driver.__exit__()
答案 0 :(得分:1)
发现的错误:
首先,您需要在尝试访问无效的http:////ak-static.cms.nba.com/wp-content/uploads/headshots/nba/latest/260x190/1629121.png
时修复该URL。因此,将行更改为:
m=('http:'+ img['src'])
第二,您需要写为字节。因此更改为:
f = open('C:/jaylen_adams.jpg','wb')
代码:
from selenium import webdriver
from bs4 import BeautifulSoup
import requests
import os
driver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
url = 'https://www.nba.com/players/jaylen/adams/1629121'
driver.get(url)
#print(driver.page_source)
soup = BeautifulSoup(driver.page_source , 'lxml')
div = soup.find('section' , class_='nba-player-header__item nba-player-header__headshot')
img = div.find('img')
print("")
m=('http:'+ img['src']) # <----- edit made here
f = open('C:/jaylen_adams.jpg','wb') # <---- edit made here
f.write(requests.get(m).content)
f.close()
driver.__exit__()
ALSO::无需使用硒,因为如果您要处理多个页面,这可能会减慢该过程。您可以只使用请求来简化它,如果将文件放在.close()
语句中,则不需要使用with
文件,因为文件完成后会自动关闭:
短代码:
from bs4 import BeautifulSoup
import requests
url = 'https://www.nba.com/players/jaylen/adams/1629121'
response = requests.get(url)
soup = BeautifulSoup(response.text , 'lxml')
div = soup.find('section' , class_='nba-player-header__item nba-player-header__headshot')
img = div.find('img')
print("")
m=('http:'+ img['src'])
with open('C:/jaylen_adams.jpg','wb') as f:
f.write(requests.get(m).content)