Question

您好我遵循了Scrapy代码，我想保存一个文件中提供的所有URL标题，但它只保存最后一个标题（＆＃34; url3＆＃34;）。

    from scrapy.spider import BaseSpider
    from scrapy.selector import Selector
    from scrapy.http import HtmlResponse
    from kirt.items import KirtItem 

    class KirtSpider(BaseSpider):

        name = "spider-name"

        allowed_domains = ["url1","url2","url3"]

        start_urls = ["url1","url2","url3"]


    def parse(self,response):

        sel = Selector(response)
        title = str(sel.xpath('//title/text()').extract())

        with open('alltitles.txt','w') as f:
            f.seek(0)
            f.write(title)

Answer 1

问题出在这里，有两种不同的方式：

    with open('alltitles.txt','w') as f:
        f.seek(0)
        f.write(title)

打开模式为'w'的文件不仅会打开文件，但如果已经有一个具有该名称的文件，则会先删除该文件。您应该使用模式'a'打开文件，该文件会将新行附加到现有文件（如果存在）。

过去，你也会调用f.seek(0)，它会将文件写指针重新回到文件的开头，并使其覆盖当前的文件内容。那段代码想要更像：

    with open('alltitles.txt','a') as f:
        # write out the title and add a newline.
        f.write(title + "\n")

Scrapy在文本文件中保存URL标题

1 个答案: