为什么我的蜘蛛不能爬行所有元素?

时间:2019-08-28 14:27:46

标签: python-3.x scrapy web-crawler amazon

我是python的新手,上次我为Amazon构建了Crawler。我的问题是,我从来没有得到所有物品。我有一个包含产品链接的列表。大约有1300个链接。但是,当我让搜寻器运行时,我会得到不同数量的“爬行”物品。它在700-1100个项目之间波动。我做错了什么,所以我没有从所有1300个项目中获取信息?

class AmazoncrawlerSpider(scrapy.Spider):
    name = 'amazonCrawler'
    with open("/Users/username/PycharmProjects/whiskywebsite/src/csv/product_links.csv", "r") as f:
        start_urls = [url.strip() for url in f.readlines()]




def parse(self, response):
    all_important_information = response.css('#dp')


    for information in all_important_information:

        product_name = information.css('#productTitle').css('::text').extract()
        product_name = [name.strip() for name in product_name]
        product_price = information.css('#price_inside_buybox').css('::text').extract()
        product_price = [price.strip() for price in product_price]
        #product_asin = information.css('.col2 tr:nth-child(1) .value').css('::text').extract()
        #product_asin = [asin.strip() for asin in product_asin]
        #product_rating = information.css('.a-icon-alt').css('::text').extract()
        #product_rating = [rating.strip() for rating in product_rating]
        #product_volume = information.css('.comparison_other_attribute_row:nth-child(11) .comparison_baseitem_column .a-color-base').css('::text').extract()
        #product_volume = [volume.strip() for volume in product_volume]
        #product_country = information.css('.comparison_other_attribute_row:nth-child(10) .comparison_baseitem_column .a-color-base').css('::text').extract()
        #product_country = [country.strip() for country in product_country]
        product_picture = information.css('#imgTagWrapperId').css('img::attr(data-old-hires)').extract()
        product_picture = [picture.strip() for picture in product_picture]
        result = zip(product_name, product_price, product_picture)


        for name, price, picture in result:
            items = AmazoncrawlingItem()
            items['Name'] = name
            items['Preis'] = price
            #items['Asin'] = asin
            items['Time'] = now
            #items['Rating'] = rating
            #items['Volume'] = volume
            #items['Country'] = country
            items['Picture'] = picture
            items['Website'] = website
            yield items

[scrapy.core.engine]调试:已抓取(200)https://www.amazon.de/...link ..>(引荐来源:无)

是否与(引用者:无)有关?如果是,那是什么意思?

0 个答案:

没有答案
相关问题