Question

我刚开始用 scrapy 。我已将http://www.ikea.com/ae/en/catalog/categories/departments/childrens_ikea/31772/页面加载到scrapy shell [url]并运行response.css(div.productTitle.Floatleft)以获取产品名称，但它给出了以下错误：

Traceback（最近一次调用最后一次）：文件“”，第1行，in NameError：未定义名称“div”。

我该如何解决这个问题？

Answer 1

您必须使用字符串："div.productTitle.Floatleft"。见" "

现在您尝试使用变量div。

编辑以获取您必须设置的正确数据User-Agent

运行shell

scrapy shell http://www.ikea.com/ae/en/catalog/categories/departments/childrens_ikea/31772/

在shell中，您可以使用Web浏览器从服务器查看HTML，您将看到错误消息。

view(response)

您使用不同的User-Agent（使用之前url中的response）再次获取页面

fetch(response.url, headers={'User-Agent': 'Mozilla/5.0'})

response.css('div.productTitle.floatLeft')

顺便说一句：它必须是floatLeft，而不是Floatleft - 请参见下方f和上方L

编辑：与独立脚本相同（不需要项目）

import scrapy

class MySpider(scrapy.Spider):

    name = 'myspider'

    #allowed_domains = ['http://www.ikea.com']

    start_urls = ['http://www.ikea.com/ae/en/catalog/categories/departments/childrens_ikea/31772/']

    def parse(self, response):
        print('url:', response.url)

        all_products = response.css('div.product')

        for product in all_products:
            title = product.css('div.productTitle.floatLeft ::text').extract()
            description = product.css('div.productDesp ::text').extract()
            price = product.css('div.price.regularPrice ::text').extract()
            price = price[0].strip()

            print('item:', title, description, price)

            yield {'title': title, 'description': description, 'price': price}

# --- it runs without project and saves in 'output.csv' ---

from scrapy.crawler import CrawlerProcess

c = CrawlerProcess({
    'USER_AGENT': 'Mozilla/5.0',
    'FEED_FORMAT': 'csv',
    'FEED_URI': 'output.csv',
})
c.crawl(MySpider)
c.start()

文件output.csv中的结果：

title,description,price
BÖRJA,feeding spoon and baby spoon,Dhs 5.00
BÖRJA,training beaker,Dhs 5.00
KLADD RANDIG,bib,Dhs 9.00
KLADDIG,bib,Dhs 29.00
MATA,4-piece eating set,Dhs 9.00
SMASKA,bowl,Dhs 9.00
SMASKA,plate,Dhs 12.00
SMÅGLI,plate/bowl,Dhs 19.00
STJÄRNBILD,bib,Dhs 19.00

reponse.css无法正常工作

1 个答案: