我有一个容易破解的代码,我想将其直接输出到我的Google驱动器中,我发现pydrive
易于使用和上传文件(我对其进行了测试并且有效)
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
gauth = GoogleAuth()
gauth.LocalWebserverAuth()
drive = GoogleDrive(gauth)
file2 = drive.CreateFile()
file2.SetContentFile('testing1.csv')
file2.Upload()
如何将其与scrapy runspider test1.py -o test.csv
一起直接上传到驱动器?
如果它不起作用,有什么建议吗?
答案 0 :(得分:1)
它不是那样工作的,这是如何在没有
scrapy runspider test1.py
,
您应该创建词典列表,然后将其写入CSV文件,然后使用Google函数上传,然后再删除您创建的文件。
import scrapy
from scrapy.crawler import CrawlerProcess
#Your Spider
if __name__ == "__main__":
process = CrawlerProcess()
process.crawl(NAME_OF_YOUR_SPIDER)
spider = next(iter(process.crawlers)).spider
process.start()
#write to csv
#upload to Google drive
答案 1 :(得分:1)
您需要编写自定义pipeline或feed exporter。
例如,如果您的搜寻器很小,并且结果比这样的简单管道更适合您的内存:
# myproject/pipelines.py
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
class GdrivePipeline:
data = []
def process_item(self, item, spider):
data.append(item)
return item
def close_spider(self, spider):
gauth = GoogleAuth()
gauth.LocalWebserverAuth()
drive = GoogleDrive(gauth)
file2 = drive.CreateFile()
# write self.data to file
file2.Upload()
然后在您的设置中将其激活:
ITEM_PIPELINES = {
'myproject.pipelines.GdrivePipeline': 999,
}