Question

我编写了一个Python脚本来下载所有xkcd漫画图像。唯一的问题是，当它到达最后一个时我不能告诉它停止......这是我到目前为止所拥有的。

import re, mechanize
from urllib import urlretrieve
from BeautifulSoup import BeautifulSoup as bs

baseUrl = "http://xkcd.com/1/" #Specify the first comic page
br = mechanize.Browser() #Create a browser

response = br.open(baseUrl) #Create an initial response

x = 1 #Assign an initial file name
while (SomeCondition):
    soup = bs(response.get_data()) #Create an instance of bs that contains the response data
    img = soup.findAll('img')[1] #Get the online file path of the image
    localFile = "C:\\Comics\\xkcd\\" + str(x) + ".jpg"  #Come up with a local file name
    urlretrieve(img["src"], localFile) #Download the image file
    response = br.follow_link(text = "Next >") #Store the response of the next button
    x += 1 #Increase x by 1
print "All xkcd comics downloaded" #Let the user know the images have been downloaded

最初我所拥有的是像

while br.follow_link(text = "Next >") != br.follow_link(text = ">|"):

但是通过这样做，我实际上是在脚本有机会执行预期目的之前将skip发送到最后一页。

Answer 1

当您按照最新的xkcd漫画中的“下一步”链接时，会在网址后附加一个哈希标记。请尝试使用以下内容。

while not br.geturl().endswith("#"):
    ...

需要帮助指定条件结束

1 个答案: