Scrapy-从一个链接中获取多个项目

时间:2020-04-22 21:44:07

标签: python css scrapy

所以我的目标是从CSV文件中抓取一个列表(我已经弄清楚了那部分),但是当我尝试在测试URL上运行程序时,该URL被抓了几次并返回我想要的结果一次。我将用我的代码和一些屏幕截图进行澄清。

# -*- coding: utf-8 -*-
import scrapy
from ..items import LowesspiderItem
from scrapy.http import Request
import requests



class LowesSpider(scrapy.Spider):
    name = 'lowes'
def start_requests(self):
    start_urls = ['https://www.lowes.com/search?searchTerm=8654RM-42',
    'https://www.lowes.com/search?searchTerm=RA36']
    for url in start_urls:
        yield Request(url,
                    headers={'Cookie': 'sn=2333;'}, #Preset a location
                    meta={'dont_merge_cookies': True, #Allows location cookie to get through
                    'url':url}) #Using to get the product SKU

def parse(self, response):
    items = response.css('.grid-container')
    for product in items:
        item = LowesspiderItem()

    #get product price
        productPrice = product.css('.art-pd-price::text').get()
    #get SKU
        productSKU = response.meta['url']
        productSKU = productSKU.split('=')[-1]



        item["productSKU"] = productSKU
        item["productPrice"] = productPrice


        yield item

2020-04-21 14:09:48 [scrapy.core.engine] INFO: Spider opened
2020-04-21 14:09:48 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2020-04-21 14:09:48 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2020-04-21 14:09:48 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.lowes.com/robots.txt> (referer: None)
2020-04-21 14:09:48 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.lowes.com/pd/ZLINE-KITCHEN-BATH-Ducted-Red-Matte-Wall-Mounted-Range-Hood-Common-42-Inch-Actual-42-in/1001440644> from <GET https://www.lowes.com/search?searchTerm=8654RM-42>
2020-04-21 14:09:49 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.lowes.com/pd/ZLINE-KITCHEN-BATH-Professional-Deep-Recessed-6-Burners-Convection-Stainless-Steel-Common-36-in-Actual-36-in/1000525095> from <GET https://www.lowes.com/search?searchTerm=RA36>
2020-04-21 14:09:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.lowes.com/pd/ZLINE-KITCHEN-BATH-Ducted-Red-Matte-Wall-Mounted-Range-Hood-Common-42-Inch-Actual-42-in/1001440644> (referer: None)
2020-04-21 14:09:49 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.lowes.com/pd/ZLINE-KITCHEN-BATH-Ducted-Red-Matte-Wall-Mounted-Range-Hood-Common-42-Inch-Actual-42-in/1001440644>
{'productPrice': None, 'productSKU': '8654RM-42'}
2020-04-21 14:09:49 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.lowes.com/pd/ZLINE-KITCHEN-BATH-Ducted-Red-Matte-Wall-Mounted-Range-Hood-Common-42-Inch-Actual-42-in/1001440644>
{'productPrice': None, 'productSKU': '8654RM-42'}
2020-04-21 14:09:49 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.lowes.com/pd/ZLINE-KITCHEN-BATH-Ducted-Red-Matte-Wall-Mounted-Range-Hood-Common-42-Inch-Actual-42-in/1001440644>
{'productPrice': None, 'productSKU': '8654RM-42'}
2020-04-21 14:09:49 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.lowes.com/pd/ZLINE-KITCHEN-BATH-Ducted-Red-Matte-Wall-Mounted-Range-Hood-Common-42-Inch-Actual-42-in/1001440644>
{'productPrice': '1,449.95', 'productSKU': '8654RM-42'}

2020-04-21 14:09:49 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.lowes.com/pd/ZLINE-KITCHEN-BATH-Ducted-Red-Matte-Wall-Mounted-Range-Hood-Common-42-Inch-Actual-42-in/1001440644>
{'productPrice': None, 'productSKU': '8654RM-42'}
2020-04-21 14:09:49 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.lowes.com/pd/ZLINE-KITCHEN-BATH-Ducted-Red-Matte-Wall-Mounted-Range-Hood-Common-42-Inch-Actual-42-in/1001440644>
{'productPrice': None, 'productSKU': '8654RM-42'}
2020-04-21 14:09:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.lowes.com/pd/ZLINE-KITCHEN-BATH-Professional-Deep-Recessed-6-Burners-Convection-Stainless-Steel-Common-36-in-Actual-36-in/1000525095> (referer: None)
2020-04-21 14:09:49 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.lowes.com/pd/ZLINE-KITCHEN-BATH-Professional-Deep-Recessed-6-Burners-Convection-Stainless-Steel-Common-36-in-Actual-36-in/1000525095>
{'productPrice': None, 'productSKU': 'RA36'}
2020-04-21 14:09:49 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.lowes.com/pd/ZLINE-KITCHEN-BATH-Professional-Deep-Recessed-6-Burners-Convection-Stainless-Steel-Common-36-in-Actual-36-in/1000525095>
{'productPrice': None, 'productSKU': 'RA36'}
2020-04-21 14:09:49 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.lowes.com/pd/ZLINE-KITCHEN-BATH-Professional-Deep-Recessed-6-Burners-Convection-Stainless-Steel-Common-36-in-Actual-36-in/1000525095>
{'productPrice': None, 'productSKU': 'RA36'}
2020-04-21 14:09:49 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.lowes.com/pd/ZLINE-KITCHEN-BATH-Professional-Deep-Recessed-6-Burners-Convection-Stainless-Steel-Common-36-in-Actual-36-in/1000525095>
{'productPrice': '2,549.99', 'productSKU': 'RA36'}

2020-04-21 14:09:49 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.lowes.com/pd/ZLINE-KITCHEN-BATH-Professional-Deep-Recessed-6-Burners-Convection-Stainless-Steel-Common-36-in-Actual-36-in/1000525095>
{'productPrice': None, 'productSKU': 'RA36'}
2020-04-21 14:09:49 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.lowes.com/pd/ZLINE-KITCHEN-BATH-Professional-Deep-Recessed-6-Burners-Convection-Stainless-Steel-Common-36-in-Actual-36-in/1000525095>
{'productPrice': None, 'productSKU': 'RA36'}

所以我想要的结果将是这样的: {'productPrice':'1,449.95','productSKU':'8654RM-42'}

但是,在我的程序中,我得到了很多重复的结果,我认为这是由于我的顶级项在for循环上迭代而引起的

items = response.css('.grid-container')
    for product in items:
        item = LowesspiderItem()

这也是Excel输出的屏幕截图: enter image description here

1 个答案:

答案 0 :(得分:1)

您要抓取的两个URL是单个产品页面,因此您不需要for product in items循环。