Scrapy Web抓取中的AttributeError

时间:2020-04-25 12:55:53

标签: python web-scraping scrapy

我编写了一个抓取代码来抓取网站,但出现属性错误。我是网络抓取的新手,所以请您指导我如何解决此错误。 这是错误消息:AttributeError:'str'对象没有属性'xpath'

这是我的代码:

# -*- coding: utf-8 -*-
import scrapy


class ShopSpider(scrapy.Spider):
    name = 'shop'
    allowed_domains = ['https://www.redbubble.com']
    start_urls = ['https://www.redbubble.com/shop/shower-curtains/']

    def parse(self, response):

        products = response.xpath("//a[@class='styles__link--2sYi3']").get()
        for product in products:
            product_url = product.xpath(".//img[@class='styles__image--2CwxX styles__productImage--3ZNPD styles__rounded--1lyoH styles__fluid--3dxe-']/@src").get()
            title = name = product.xpath(".//div[@class='styles__box--206r9 styles__paddingRight-0--fzRHs']/div[@class='styles__textContainer--1xehi styles__disableLineHeight--3n9Fg styles__nowrap--2Vk3A']/span/text()").get()
            yield {
                'name'  :   title,
                'url'   :   product_url
            }

2 个答案:

答案 0 :(得分:0)

错误很明显

您正在尝试从字符串中调用xpath方法

请更改

products = response.xpath("//a[@class='styles__link--2sYi3']").get()

products = response.xpath("//a[@class='styles__link--2sYi3']")

答案 1 :(得分:-1)

这是对我有用的代码。之所以收到str错误,是因为您不能在字符串后使用response.xpath。您需要直接在foor循环中使用。这是我使用的代码。您也可以删除允许的域。

   import scrapy


class ShopSpider(scrapy.Spider):
    name = 'shop'
    start_urls = ['https://www.redbubble.com/shop/shower-curtains/']

    def parse(self, response):
         for product in response.xpath("//a[@class='styles__link--2sYi3']"):
            product_url = product.xpath(
                ".//img[@class='styles__image--2CwxX styles__productImage--3ZNPD styles__rounded--1lyoH styles__fluid--3dxe-']/@src").get()
            title = product.xpath(".//div[@class='styles__box--206r9 styles__paddingRight-0--fzRHs']/div[@class='styles__textContainer--1xehi styles__disableLineHeight--3n9Fg styles__nowrap--2Vk3A']/span/text()").get()
            yield {
                "title": title,
                "url": product_url
            }