当前,我正在处理10 GB的数据集。我已经将其上传到了Google云存储中,但是我不知道如何将其导入到Google Colab中。
答案 0 :(得分:12)
from google.colab import auth
auth.authenticate_user()
运行此命令后,将生成一个链接,您可以单击它并完成登录。
!echo "deb http://packages.cloud.google.com/apt gcsfuse-bionic main" > /etc/apt/sources.list.d/gcsfuse.list
!curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
!apt -qq update
!apt -qq install gcsfuse
使用它在colab上安装gcsfuse。 Cloud Storage FUSE是一种开源FUSE适配器,可用于将Cloud Storage存储桶作为文件系统挂载在Colab,Linux或macOS系统上。
!mkdir folderOnColab
!gcsfuse folderOnBucket/content/ folderOnColab
使用它来挂载目录。 (folderOnBucket是不带gs://部分的GCS存储桶URL)
您可以使用此文档进行进一步阅读。 https://cloud.google.com/storage/docs/gcs-fuse
答案 1 :(得分:2)
文档在External data: Drive, Sheets, and Cloud Storage中对此进行了介绍...
答案 2 :(得分:2)
使用专用服务帐户和Python:
from google.oauth2 import service_account
from google.cloud.storage import client
import io
import pandas as pd
from io import BytesIO
import json
import filecmp
使用服务帐户令牌作为str:
SERVICE_ACCOUNT = json.loads(r"""{
"type": "service_account",
"project_id": "[REPLACE WITH YOUR FILE]",
"privat_sae_key_id": "[REPLACE WITH YOUR FILE]",
"private_key": "[REPLACE WITH YOUR FILE]",
"client_email": "[REPLACE WITH YOUR FILE]",
"client_id": "[REPLACE WITH YOUR FILE]",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "[REPLACE WITH YOUR FILE]"
}""")
BUCKET = "[NAME OF YOUR BUCKET TO READ/WITE YOUR DATA]"
使用服务令牌创建客户端:
credentials = service_account.Credentials.from_service_account_info(
SERVICE_ACCOUNT,
scopes=["https://www.googleapis.com/auth/cloud-platform"],
)
client = client.Client(
credentials=credentials,
project=credentials.project_id,
)
保存和下载功能:
def save_file(local_filename, remote_filename):
bucket = client.get_bucket(BUCKET)
blob = bucket.blob(remote_filename)
blob.upload_from_filename(local_filename)
def download_file(local_filename, remote_filename):
bucket = client.get_bucket(BUCKET)
blob = bucket.blob(remote_filename)
blob.download_to_filename(local_filename)
让我们检查一下由Pandas生成的CSV文件:
df_test = pd.DataFrame(
{"col1": [1,2,3],
"col2": [4,5,6]}
).to_csv(path_or_buf="/tmp/test.csv")
save_file("/tmp/test.csv","test.csv")
download_file("/tmp/test2.csv","test.csv")
assert filecmp.cmp('/tmp/test.csv', '/tmp/test2.csv')