Question

我在python scrapy中编写了一个脚本，用于从网页中删除名称和价格并将其写入csv文件。脚本运行完美。

然而，当爬行完成时，我会注意到在csv文件中，结果在两条线之间有一个均匀的间隙，这意味着每两行之间有一个线间隙。

此时我试图在spider类中写几行来获得无斑点输出，现在我所拥有的csv输出没有任何行差距。

我的问题是：我做了相应的事吗？因为，我没有在“items.py”和“sephsp.py”之间建立任何关系，但我得到了结果。对“sephsp.py”文件有“items.py”文件的监督吗？最后，在我的蜘蛛类中，我使用writer作为“global”来进入“target_page”方法，在csv文件中写入两个字段。提前谢谢。

这就是我得到的csv输出：Click to see。如果我认为我的下面的脚本是准确的，现在已经修复了。

这是我试过的脚本：

“items.py”包括：

import scrapy
class SephoraItem(scrapy.Item):
    name = scrapy.Field()       # I couldn't find any way to make a bridge between this name and the name in spider class
    price = scrapy.Field()

蜘蛛文件包含：

import scrapy
import csv

outfile = open("Sephora.csv","w",newline='')
writer = csv.writer(outfile)

class SephoraSpider(scrapy.Spider):
    name = "sephorasp"
    start_urls = ["https://www.sephora.ae/en/stores/"]

    def parse(self, response):
        for link in response.css('ul.nav-primary a.level0::attr(href)').extract():
            yield scrapy.Request(url=link, callback=self.parse_inner_pages)

    def parse_inner_pages(self, response):
        for link in response.css('li.amshopby-cat > a::attr(href)').extract():
            yield scrapy.Request(url=link, callback=self.target_page)

    def target_page(self, response):
        global writer                                   # Here I've used writer as global
        for titles in response.css('div.product-info'):
            name = titles.css('.product-name > a::text').extract_first()
            price = titles.css('span.price::text').extract_first()
            yield {'name': name, 'price': price}  #This line is for bringing the clarity that I've got no issues with printing results
            writer.writerow([name,price])

最后一件事：如果我不希望将“作家”声明为全局，那么作者可以选择哪种方法来穿透“target_page（）”方法并编写两个字段？

固定scrapy输出中行之间的行间隙

0 个答案: