我需要更改AtBS书中的xkcd项目,以便从其他网站下载漫画。这是我的剧本。
#! python3
# getwebcomic.py - Downloads every single smbc comic.
import requests, os, bs4
os.chdir('C:\\Users\\Bob\\Desktop\\')
url = 'https://smbc-comics.com' # starting url
os.makedirs('smbc', exist_ok=True) # store comics in ./smbc
noAbuse=0
for noAbuse in range(0, 5):
#while not url.endswith('#'):
# Download the page.
print('Downloading page %s...' % url)
res = requests.get(url)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, "html.parser")
# Find the URL of the comic image.
comicElem = soup.select('#cc-comicbody')
print('I am finding it')
print(comicElem)
if comicElem == []:
print('Could not find comic image.')
else:
print(comicElem[0].get('src'))
print('I dont know why the .get is returning NONE!')
print('It is there???')
print('...and now it crashes')
comicUrl = 'https//smbc-comics.com' + comicElem[0].get('src')
print(comicUrl)
# Download the image.
print('Downloading image %s...' % (comicUrl))
res = requests.get(comicUrl)
res.raise_for_status()
# Save the image to ./smbc
imageFile = open(os.path.join('smbc', os.path.basename(comicUrl)),
'wb')
for chunk in res.iter_content(100000):
imageFile.write(chunk)
imageFile.close()
# Get the Prev button's url.
prevLink = soup.select('a[rel="prev"]')[0]
url = 'http://smbc-comic.com' + prevLink.get('href')
print('Done.')
输出
Downloading page https://smbc-comics.com...
I am finding it
[<div id="cc-comicbody"><img border="0" id="cc-comic"
src="/comics/1524150658-20170419 (1).png"
title="You can also just use an infinite quantity of compasses as on-off switches."/><br/></div>]
None
我不知道为什么.get
没有回来!
它在那里???
......现在它崩溃了
追踪(最近的呼叫最后):
文件“C:\ Users \ Bob \ PythonScripts \ getwebcomic.py”,第31行,&lt; module&gt;
comicUrl ='https // smbc-comics.com'+ comicElem [0] .get('src')
TypeError:必须是str,而不是NoneType
我似乎无法弄清楚为什么.get方法在'src'属性存在时返回none。任何提示将不胜感激。我添加了一些额外的print()来帮助我看看脚本运行时发生了什么。
答案 0 :(得分:0)
comicElem[0]
是一个部门(<div>
)。它没有src
属性,这就是.get
返回None
的原因。您应该尝试使用comicElem[0].img.get("src")
,然后返回"/comics/1524150658-20170419 (1).png"
。