抓取输出功能直接到谷歌驱动器

时间:2019-05-05 21:18:12

标签: python scrapy google-drive-api

我有一个容易破解的代码,我想将其直接输出到我的Google驱动器中,我发现pydrive易于使用和上传文件(我对其进行了测试并且有效)

from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive

gauth = GoogleAuth()
gauth.LocalWebserverAuth()

drive = GoogleDrive(gauth)

file2 = drive.CreateFile()
file2.SetContentFile('testing1.csv')
file2.Upload()

如何将其与scrapy runspider test1.py -o test.csv一起直接上传到驱动器?

如果它不起作用,有什么建议吗?

2 个答案:

答案 0 :(得分:1)

它不是那样工作的,这是如何在没有 scrapy runspider test1.py, 您应该创建词典列表,然后将其写入CSV文件,然后使用Google函数上传,然后再删除您创建的文件。

import scrapy 
from scrapy.crawler import CrawlerProcess
#Your Spider

if __name__ == "__main__":
    process = CrawlerProcess()
    process.crawl(NAME_OF_YOUR_SPIDER)
    spider = next(iter(process.crawlers)).spider
    process.start()
    #write to csv
    #upload to Google drive

答案 1 :(得分:1)

您需要编写自定义pipelinefeed exporter

例如,如果您的搜寻器很小,并且结果比这样的简单管道更适合您的内存:

# myproject/pipelines.py

from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive

class GdrivePipeline:
    data = []

    def process_item(self, item, spider):
        data.append(item)
        return item

    def close_spider(self, spider):
        gauth = GoogleAuth()
        gauth.LocalWebserverAuth()

        drive = GoogleDrive(gauth)

        file2 = drive.CreateFile()
        # write self.data to file
        file2.Upload()

然后在您的设置中将其激活:

ITEM_PIPELINES = {
    'myproject.pipelines.GdrivePipeline': 999,
}