Question

我是python的新手，上次我为Amazon构建了Crawler。我的问题是，我从来没有得到所有物品。我有一个包含产品链接的列表。大约有1300个链接。但是，当我让搜寻器运行时，我会得到不同数量的“爬行”物品。它在700-1100个项目之间波动。我做错了什么，所以我没有从所有1300个项目中获取信息？

class AmazoncrawlerSpider(scrapy.Spider):
    name = 'amazonCrawler'
    with open("/Users/username/PycharmProjects/whiskywebsite/src/csv/product_links.csv", "r") as f:
        start_urls = [url.strip() for url in f.readlines()]




def parse(self, response):
    all_important_information = response.css('#dp')


    for information in all_important_information:

        product_name = information.css('#productTitle').css('::text').extract()
        product_name = [name.strip() for name in product_name]
        product_price = information.css('#price_inside_buybox').css('::text').extract()
        product_price = [price.strip() for price in product_price]
        #product_asin = information.css('.col2 tr:nth-child(1) .value').css('::text').extract()
        #product_asin = [asin.strip() for asin in product_asin]
        #product_rating = information.css('.a-icon-alt').css('::text').extract()
        #product_rating = [rating.strip() for rating in product_rating]
        #product_volume = information.css('.comparison_other_attribute_row:nth-child(11) .comparison_baseitem_column .a-color-base').css('::text').extract()
        #product_volume = [volume.strip() for volume in product_volume]
        #product_country = information.css('.comparison_other_attribute_row:nth-child(10) .comparison_baseitem_column .a-color-base').css('::text').extract()
        #product_country = [country.strip() for country in product_country]
        product_picture = information.css('#imgTagWrapperId').css('img::attr(data-old-hires)').extract()
        product_picture = [picture.strip() for picture in product_picture]
        result = zip(product_name, product_price, product_picture)


        for name, price, picture in result:
            items = AmazoncrawlingItem()
            items['Name'] = name
            items['Preis'] = price
            #items['Asin'] = asin
            items['Time'] = now
            #items['Rating'] = rating
            #items['Volume'] = volume
            #items['Country'] = country
            items['Picture'] = picture
            items['Website'] = website
            yield items

[scrapy.core.engine]调试：已抓取（200）https://www.amazon.de/...link ..>（引荐来源：无）

是否与（引用者：无）有关？如果是，那是什么意思？

为什么我的蜘蛛不能爬行所有元素？

0 个答案: