我编写了一个Python脚本来下载所有xkcd漫画图像。唯一的问题是,当它到达最后一个时我不能告诉它停止......这是我到目前为止所拥有的。
import re, mechanize
from urllib import urlretrieve
from BeautifulSoup import BeautifulSoup as bs
baseUrl = "http://xkcd.com/1/" #Specify the first comic page
br = mechanize.Browser() #Create a browser
response = br.open(baseUrl) #Create an initial response
x = 1 #Assign an initial file name
while (SomeCondition):
soup = bs(response.get_data()) #Create an instance of bs that contains the response data
img = soup.findAll('img')[1] #Get the online file path of the image
localFile = "C:\\Comics\\xkcd\\" + str(x) + ".jpg" #Come up with a local file name
urlretrieve(img["src"], localFile) #Download the image file
response = br.follow_link(text = "Next >") #Store the response of the next button
x += 1 #Increase x by 1
print "All xkcd comics downloaded" #Let the user know the images have been downloaded
最初我所拥有的是像
while br.follow_link(text = "Next >") != br.follow_link(text = ">|"):
但是通过这样做,我实际上是在脚本有机会执行预期目的之前将skip发送到最后一页。
答案 0 :(得分:1)
当您按照最新的xkcd漫画中的“下一步”链接时,会在网址后附加一个哈希标记。请尝试使用以下内容。
while not br.geturl().endswith("#"):
...