我正在使用Scrapy库来抓取该URL https://www.amazon.com/bose-soundlink/s?ie=UTF8&page=1&rh=i%3Aaps%2Ck%3Abose%20soundlink上出现的所有产品名称
这是我的解析方法
def parse(self, response):
print("inside parse")
for product in response.xpath('//li[contains(@id,"result")]'):
print(product)
product_name = product.xpath('.//div[@class="a-row a-spacing-small"]//div[@class="a-row a-spacing-none"]/a/h2/text()').extract_first()
if product_name != None:
print("Product name: " + product_name)
print("---------------------------------------")
但是,我没有得到所有产品名称。此页面上有23个产品,但是我的Spider只能抓取其中的16个产品(每次都检索相同的16个产品)。在这16种产品中,我可以获得14种产品的名称,而其余2种产品中的product_name
为None
。我不知道这是由于错误的parse
功能还是Amazon聪明地针对机器人而造成的。
这是我运行scrapy crawl
<Selector xpath='//li[contains(@id,"result")]' data='<li id="result_0" data-asin="B077BPG968"'>
---------------------------------------
<Selector xpath='//li[contains(@id,"result")]' data='<li id="result_1" data-asin="B0748N1BZD"'>
Product name: Bose SoundLink Micro Bluetooth speaker - Black
---------------------------------------
<Selector xpath='//li[contains(@id,"result")]' data='<li id="result_2" data-asin="B01HETFQKS"'>
Product name: Bose 752195-0100 SoundLink Color Bluetooth speaker II - Soft black
---------------------------------------
<Selector xpath='//li[contains(@id,"result")]' data='<li id="result_3" data-asin="B06XCQKTKR"'>
Product name: Bose SoundLink Revolve Portable Bluetooth 360 Speaker, Triple Black
---------------------------------------
<Selector xpath='//li[contains(@id,"result")]' data='<li id="result_4" data-asin="B007RP7JU0"'>
---------------------------------------
<Selector xpath='//li[contains(@id,"result")]' data='<li id="result_5" data-asin="B06XCW4VFS"'>
Product name: Bose SoundLink Revolve+ Portable & Long-Lasting Bluetooth 360 Speaker - Triple Black
---------------------------------------
<Selector xpath='//li[contains(@id,"result")]' data='<li id="result_6" data-asin="B0117RGG8E"'>
Product name: Bose SoundLink around-ear wireless headphones II Black
---------------------------------------
<Selector xpath='//li[contains(@id,"result")]' data='<li id="result_7" data-asin="B01LZRFI5C"'>
Product name: Bose soundlink mini II Limited Edition Bluetooth speaker
---------------------------------------
<Selector xpath='//li[contains(@id,"result")]' data='<li id="result_8" data-asin="B00WK47VEW"'>
Product name: Bose SoundLink Mini Bluetooth Speaker II (Carbon)
---------------------------------------
<Selector xpath='//li[contains(@id,"result")]' data='<li id="result_9" data-asin="B00N32I2Q6"'>
Product name: Bose SoundLink Color Bluetooth Speaker (Black)
---------------------------------------
<Selector xpath='//li[contains(@id,"result")]' data='<li id="result_10" data-asin="B00HWSXVDG'>
Product name: Bose SoundLink Bluetooth Speaker III
---------------------------------------
<Selector xpath='//li[contains(@id,"result")]' data='<li id="result_11" data-asin="B00D5Q75RC'>
Product name: Bose SoundLink Mini Bluetooth Speaker (Discontinued by Manufacturer)
---------------------------------------
<Selector xpath='//li[contains(@id,"result")]' data='<li id="result_12" data-asin="B0090Z3SPU'>
Product name: SoundLink Bluetooth Mobile Speaker II - Nylon (Discontinued by Manufacturer)
---------------------------------------
<Selector xpath='//li[contains(@id,"result")]' data='<li id="result_13" data-asin="B076LFBHKJ'>
Product name: Bose SoundLink Mini II Bluetooth Speaker, Black
---------------------------------------
<Selector xpath='//li[contains(@id,"result")]' data='<li id="result_14" data-asin="B074QRK6CG'>
Product name: Bose SoundLink On-Ear Bluetooth Headphones with Microphone, Triple Black
---------------------------------------
<Selector xpath='//li[contains(@id,"result")]' data='<li id="result_15" data-asin="B078H4FH5K'>
Product name: Bose SoundLink Micro Waterproof Bluetooth speaker (Black) with AmazonBasics Case (Black)
---------------------------------------
<Selector xpath='//li[contains(@id,"result")]' data='<li id="s-result-list-layout-placeholder'>
---------------------------------------