如何在google-colaboratory上永久上传数据?

时间:2018-05-19 13:52:51

标签: python google-colaboratory

当我使用以下代码上传数据时,一旦断开连接,数据就会消失。

from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

请建议我上传数据的方法,以便即使在数天断开连接后数据仍保持完整。

2 个答案:

答案 0 :(得分:1)

我将数据永久保存在google驱动器中的.zip文件中,并使用以下代码将其上传到google colabs VM。

将其粘贴到单元格中,然后更改file_id。您可以在google drive中找到文件网址中的file_id。 (右键单击文件 - >获取可共享链接 - >打开后找到URL的一部分?id =)

#@title uploader
file_id = "1BuM11fJJ1qdZH3VbQ-GwPlK5lAvXiNDv" #@param {type:"string"}
!pip install -U -q PyDrive

from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

# 1. Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

# PyDrive reference:
# https://googledrive.github.io/PyDrive/docs/build/html/index.html


from google.colab import auth
auth.authenticate_user()

from googleapiclient.discovery import build
drive_service = build('drive', 'v3')

# Replace the assignment below with your file ID
# to download a different file.
#
# A file ID looks like: 1gLBqEWEBQDYbKCDigHnUXNTkzl-OslSO

import io
from googleapiclient.http import MediaIoBaseDownload

request = drive_service.files().get_media(fileId=file_id)
downloaded = io.BytesIO()
downloader = MediaIoBaseDownload(downloaded, request)
done = False
while done is False:
  # _ is a placeholder for a progress object that we ignore.
  # (Our file is small, so we skip reporting progress.)
  _, done = downloader.next_chunk()

fileId = drive.CreateFile({'id': file_id }) #DRIVE_FILE_ID is file id example: 1iytA1n2z4go3uVCwE_vIKouTKyIDjEq
print(fileId['title'])  
fileId.GetContentFile(fileId['title'])  # Save Drive file as a local file

!unzip {fileId['title']}

答案 1 :(得分:0)

在GDrive中保存数据很好(@skaem)。

如果您的数据包含代码,我建议您在colab笔记本的开头简单git clone来自Github(或任何其他代码版本控制服务)的源存储库。

通过这种方式,您可以离线开发,并在需要时使用最新代码在云中执行实验。