Question

我正在尝试登录抓取工具培训网站。帐户名可以是任何名称，而密码是0到30之间的数字。根据其要求，我将不得不多次尝试查找密码。所以蜘蛛需要经常尝试不同的密码。

然而，在我的代码中，蜘蛛只会尝试两次并停止。第一次使用start_requests，另一次使用解析。

你能帮帮我吗？

import scrapy
from scrapy import Request
from bs4 import BeautifulSoup

class heibanke2(scrapy.Spider):
    name = "herbanke2"
#   start_urls = ["http://www.heibanke.com/lesson/crawler_ex01/"]
    password = 0


    def parse(self, response):
        print "enter parse"
        self.password+=1
        with open("try" + str(self.password), "wb") as f:
             f.write(response.body)
        yield Request(url="http://www.heibanke.com/lesson/crawler_ex01/", callback=self.parse, cookies={'username':str(1), "password":str(self.password)})

    def start_requests(self):
        print "prepared to login"
        yield Request(url="http://www.heibanke.com/lesson/crawler_ex01/", callback=self.parse, cookies={'username':str(1), "password":str(self.password)})

Answer 1

这是因为有一个重复请求过滤中间件正在运行。由于URL本身不会更改，因此会过滤后续请求。通过dont_filter=True将其关闭：

def parse(self, response):
    print "enter parse"
    self.password+=1
    with open("try" + str(self.password), "wb") as f:
         f.write(response.body)
    yield Request(url="http://www.heibanke.com/lesson/crawler_ex01/", 
                  callback=self.parse, 
                  cookies={'username':str(1), "password":str(self.password)},
                  dont_filter=True)  # HERE

蜘蛛不会跟随链接

1 个答案: