Question

我搜索了有关此主题的一些问题，但找不到解决我问题的方法。

我目前正在尝试在一个站点上使用多个解析器，具体取决于我要搜索的产品。在尝试了一些方法之后，我最终得到了这一点：

有了这个开始请求：

def start_requests(self):

    txtfile = open('productosABuscar.txt', 'r')

    keywords = txtfile.readlines()

    txtfile.close()

    for keyword in keywords:

        yield Request(self.search_url.format(keyword))

这进入了我正常的parse_item。

我想做的是使用这个parse_item（通过检查笔记本电脑，平板电脑等物品类别）：

def parse_item(self,response):
        #I get the items category for the if/else
    category = re.sub('Back to search results for |"','', response.xpath('normalize-space(//span[contains(@class, "a-list-item")]//a/text())').extract_first())
        #Get the product link, for example (https://www.amazon.com/Lenovo-T430s-Performance-Professional-Refurbished/dp/B07L4FR92R/ref=sr_1_7?s=pc&ie=UTF8&qid=1545829464&sr=1-7&keywords=laptop)
    urlProducto = response.request.url

        #This can be done in a nicer way, just trying out if it works atm
    if category == 'Laptop':

        yield response.follow(urlProducto, callback = parse_laptop)

使用：

def parse_laptop(self, response):

    #Parse things

有什么建议吗？运行此代码时出现的错误是“ parse_laptop”未定义。我已经尝试过将parse_laptop放在parse_item之上，但我仍然遇到相同的错误。

Answer 1

您需要引用方法而不是函数，因此只需像这样更改它即可：

yield response.follow(urlProducto, callback = self.parse_laptop)

Answer 2

yield response.follow(urlProducto, callback = parse_laptop) 这是请求，您可能已经注意到def parse_laptop(self, response):函数需要自身对象，这是您的函数parse_laptop。因此，请将您的请求修改为： yield response.follow(urlProducto, callback = self.parse_laptop) 这应该可以完成工作。

谢谢。

抓取页面时使用多个解析器

2 个答案: