所以我的目标是从CSV文件中抓取一个列表(我已经弄清楚了那部分),但是当我尝试在测试URL上运行程序时,该URL被抓了几次并返回我想要的结果一次。我将用我的代码和一些屏幕截图进行澄清。
# -*- coding: utf-8 -*-
import scrapy
from ..items import LowesspiderItem
from scrapy.http import Request
import requests
class LowesSpider(scrapy.Spider):
name = 'lowes'
def start_requests(self):
start_urls = ['https://www.lowes.com/search?searchTerm=8654RM-42',
'https://www.lowes.com/search?searchTerm=RA36']
for url in start_urls:
yield Request(url,
headers={'Cookie': 'sn=2333;'}, #Preset a location
meta={'dont_merge_cookies': True, #Allows location cookie to get through
'url':url}) #Using to get the product SKU
def parse(self, response):
items = response.css('.grid-container')
for product in items:
item = LowesspiderItem()
#get product price
productPrice = product.css('.art-pd-price::text').get()
#get SKU
productSKU = response.meta['url']
productSKU = productSKU.split('=')[-1]
item["productSKU"] = productSKU
item["productPrice"] = productPrice
yield item
2020-04-21 14:09:48 [scrapy.core.engine] INFO: Spider opened
2020-04-21 14:09:48 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2020-04-21 14:09:48 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2020-04-21 14:09:48 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.lowes.com/robots.txt> (referer: None)
2020-04-21 14:09:48 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.lowes.com/pd/ZLINE-KITCHEN-BATH-Ducted-Red-Matte-Wall-Mounted-Range-Hood-Common-42-Inch-Actual-42-in/1001440644> from <GET https://www.lowes.com/search?searchTerm=8654RM-42>
2020-04-21 14:09:49 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.lowes.com/pd/ZLINE-KITCHEN-BATH-Professional-Deep-Recessed-6-Burners-Convection-Stainless-Steel-Common-36-in-Actual-36-in/1000525095> from <GET https://www.lowes.com/search?searchTerm=RA36>
2020-04-21 14:09:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.lowes.com/pd/ZLINE-KITCHEN-BATH-Ducted-Red-Matte-Wall-Mounted-Range-Hood-Common-42-Inch-Actual-42-in/1001440644> (referer: None)
2020-04-21 14:09:49 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.lowes.com/pd/ZLINE-KITCHEN-BATH-Ducted-Red-Matte-Wall-Mounted-Range-Hood-Common-42-Inch-Actual-42-in/1001440644>
{'productPrice': None, 'productSKU': '8654RM-42'}
2020-04-21 14:09:49 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.lowes.com/pd/ZLINE-KITCHEN-BATH-Ducted-Red-Matte-Wall-Mounted-Range-Hood-Common-42-Inch-Actual-42-in/1001440644>
{'productPrice': None, 'productSKU': '8654RM-42'}
2020-04-21 14:09:49 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.lowes.com/pd/ZLINE-KITCHEN-BATH-Ducted-Red-Matte-Wall-Mounted-Range-Hood-Common-42-Inch-Actual-42-in/1001440644>
{'productPrice': None, 'productSKU': '8654RM-42'}
2020-04-21 14:09:49 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.lowes.com/pd/ZLINE-KITCHEN-BATH-Ducted-Red-Matte-Wall-Mounted-Range-Hood-Common-42-Inch-Actual-42-in/1001440644>
{'productPrice': '1,449.95', 'productSKU': '8654RM-42'}
2020-04-21 14:09:49 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.lowes.com/pd/ZLINE-KITCHEN-BATH-Ducted-Red-Matte-Wall-Mounted-Range-Hood-Common-42-Inch-Actual-42-in/1001440644>
{'productPrice': None, 'productSKU': '8654RM-42'}
2020-04-21 14:09:49 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.lowes.com/pd/ZLINE-KITCHEN-BATH-Ducted-Red-Matte-Wall-Mounted-Range-Hood-Common-42-Inch-Actual-42-in/1001440644>
{'productPrice': None, 'productSKU': '8654RM-42'}
2020-04-21 14:09:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.lowes.com/pd/ZLINE-KITCHEN-BATH-Professional-Deep-Recessed-6-Burners-Convection-Stainless-Steel-Common-36-in-Actual-36-in/1000525095> (referer: None)
2020-04-21 14:09:49 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.lowes.com/pd/ZLINE-KITCHEN-BATH-Professional-Deep-Recessed-6-Burners-Convection-Stainless-Steel-Common-36-in-Actual-36-in/1000525095>
{'productPrice': None, 'productSKU': 'RA36'}
2020-04-21 14:09:49 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.lowes.com/pd/ZLINE-KITCHEN-BATH-Professional-Deep-Recessed-6-Burners-Convection-Stainless-Steel-Common-36-in-Actual-36-in/1000525095>
{'productPrice': None, 'productSKU': 'RA36'}
2020-04-21 14:09:49 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.lowes.com/pd/ZLINE-KITCHEN-BATH-Professional-Deep-Recessed-6-Burners-Convection-Stainless-Steel-Common-36-in-Actual-36-in/1000525095>
{'productPrice': None, 'productSKU': 'RA36'}
2020-04-21 14:09:49 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.lowes.com/pd/ZLINE-KITCHEN-BATH-Professional-Deep-Recessed-6-Burners-Convection-Stainless-Steel-Common-36-in-Actual-36-in/1000525095>
{'productPrice': '2,549.99', 'productSKU': 'RA36'}
2020-04-21 14:09:49 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.lowes.com/pd/ZLINE-KITCHEN-BATH-Professional-Deep-Recessed-6-Burners-Convection-Stainless-Steel-Common-36-in-Actual-36-in/1000525095>
{'productPrice': None, 'productSKU': 'RA36'}
2020-04-21 14:09:49 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.lowes.com/pd/ZLINE-KITCHEN-BATH-Professional-Deep-Recessed-6-Burners-Convection-Stainless-Steel-Common-36-in-Actual-36-in/1000525095>
{'productPrice': None, 'productSKU': 'RA36'}
所以我想要的结果将是这样的: {'productPrice':'1,449.95','productSKU':'8654RM-42'}
但是,在我的程序中,我得到了很多重复的结果,我认为这是由于我的顶级项在for循环上迭代而引起的
items = response.css('.grid-container')
for product in items:
item = LowesspiderItem()
答案 0 :(得分:1)
您要抓取的两个URL是单个产品页面,因此您不需要for product in items
循环。