Scrapy Spider不会返回所有选择器

时间:2018-07-30 22:50:24

标签: xpath web-scraping scrapy

我正在使用Scrapy库来抓取该URL https://www.amazon.com/bose-soundlink/s?ie=UTF8&page=1&rh=i%3Aaps%2Ck%3Abose%20soundlink上出现的所有产品名称

这是我的解析方法

    def parse(self, response):
         print("inside parse")

         for product in response.xpath('//li[contains(@id,"result")]'):

             print(product)

             product_name = product.xpath('.//div[@class="a-row a-spacing-small"]//div[@class="a-row a-spacing-none"]/a/h2/text()').extract_first()

             if product_name != None:
                 print("Product name: " + product_name)

             print("---------------------------------------")

但是,我没有得到所有产品名称。此页面上有23个产品,但是我的Spider只能抓取其中的16个产品(每次都检索相同的16个产品)。在这16种产品中,我可以获得14种产品的名称,而其余2种产品中的product_nameNone。我不知道这是由于错误的parse功能还是Amazon聪明地针对机器人而造成的。

这是我运行scrapy crawl

时得到的结果

<Selector xpath='//li[contains(@id,"result")]' data='<li id="result_0" data-asin="B077BPG968"'>
---------------------------------------
<Selector xpath='//li[contains(@id,"result")]' data='<li id="result_1" data-asin="B0748N1BZD"'>
Product name: Bose SoundLink Micro Bluetooth speaker - Black
---------------------------------------
<Selector xpath='//li[contains(@id,"result")]' data='<li id="result_2" data-asin="B01HETFQKS"'>
Product name: Bose 752195-0100 SoundLink Color Bluetooth speaker II - Soft black
---------------------------------------
<Selector xpath='//li[contains(@id,"result")]' data='<li id="result_3" data-asin="B06XCQKTKR"'>
Product name: Bose SoundLink Revolve Portable Bluetooth 360 Speaker, Triple Black
---------------------------------------
<Selector xpath='//li[contains(@id,"result")]' data='<li id="result_4" data-asin="B007RP7JU0"'>
---------------------------------------
<Selector xpath='//li[contains(@id,"result")]' data='<li id="result_5" data-asin="B06XCW4VFS"'>
Product name: Bose SoundLink Revolve+ Portable & Long-Lasting Bluetooth 360 Speaker - Triple Black
---------------------------------------
<Selector xpath='//li[contains(@id,"result")]' data='<li id="result_6" data-asin="B0117RGG8E"'>
Product name: Bose SoundLink around-ear wireless headphones II Black
---------------------------------------
<Selector xpath='//li[contains(@id,"result")]' data='<li id="result_7" data-asin="B01LZRFI5C"'>
Product name: Bose soundlink mini II Limited Edition Bluetooth speaker
---------------------------------------
<Selector xpath='//li[contains(@id,"result")]' data='<li id="result_8" data-asin="B00WK47VEW"'>
Product name: Bose SoundLink Mini Bluetooth Speaker II (Carbon)
---------------------------------------
<Selector xpath='//li[contains(@id,"result")]' data='<li id="result_9" data-asin="B00N32I2Q6"'>
Product name: Bose SoundLink Color Bluetooth Speaker (Black)
---------------------------------------
<Selector xpath='//li[contains(@id,"result")]' data='<li id="result_10" data-asin="B00HWSXVDG'>
Product name: Bose SoundLink Bluetooth Speaker III
---------------------------------------
<Selector xpath='//li[contains(@id,"result")]' data='<li id="result_11" data-asin="B00D5Q75RC'>
Product name: Bose SoundLink Mini Bluetooth Speaker (Discontinued by Manufacturer)
---------------------------------------
<Selector xpath='//li[contains(@id,"result")]' data='<li id="result_12" data-asin="B0090Z3SPU'>
Product name: SoundLink Bluetooth Mobile Speaker II - Nylon (Discontinued by Manufacturer)
---------------------------------------
<Selector xpath='//li[contains(@id,"result")]' data='<li id="result_13" data-asin="B076LFBHKJ'>
Product name: Bose SoundLink Mini II Bluetooth Speaker, Black
---------------------------------------
<Selector xpath='//li[contains(@id,"result")]' data='<li id="result_14" data-asin="B074QRK6CG'>
Product name: Bose SoundLink On-Ear Bluetooth Headphones with Microphone, Triple Black
---------------------------------------
<Selector xpath='//li[contains(@id,"result")]' data='<li id="result_15" data-asin="B078H4FH5K'>
Product name: Bose SoundLink Micro Waterproof Bluetooth speaker (Black) with AmazonBasics Case (Black)
---------------------------------------
<Selector xpath='//li[contains(@id,"result")]' data='<li id="s-result-list-layout-placeholder'>
---------------------------------------

0 个答案:

没有答案