谷歌datalab:如何进口泡菜

时间:2016-09-24 06:02:43

标签: google-cloud-datalab

Google Datalab是否可以使用%% storage子句从Google存储中读取pickle / joblib模型?

此问题与Is text the only content type for %%storage magic function in datalab

有关

1 个答案:

答案 0 :(得分:4)

在其他空单元格中运行以下代码:

%%storage read --object <path-to-gcs-bucket>/my_pickle_file.pkl --variable test_pickle_var

然后运行以下代码:

from io import BytesIO    
pickle.load(BytesIO(test_pickle_var))

我使用下面的代码将pandas DataFrame作为pickle文件上传到Google Cloud Storage并将其读回:

from datalab.context import Context
import datalab.storage as storage
import pandas as pd
from io import BytesIO
import pickle

df = pd.DataFrame(data=[{1,2,3},{4,5,6}],columns=['a','b','c'])

# Create a local pickle file
df.to_pickle('my_pickle_file.pkl')

# Create a bucket in GCS
sample_bucket_name = Context.default().project_id + '-datalab-example'
sample_bucket_path = 'gs://' + sample_bucket_name
sample_bucket = storage.Bucket(sample_bucket_name)
if not sample_bucket.exists():
    sample_bucket.create()

# Write pickle to GCS
sample_item = sample_bucket.item('my_pickle_file.pkl')
with open('my_pickle_file.pkl', 'rb') as f:
    sample_item.write_to(bytearray(f.read()), 'application/octet-stream')

# Read Method 1 - Read pickle from GCS using %storage read (note single % for line magic)
path_to_pickle_in_gcs = sample_bucket_path + '/my_pickle_file.pkl'
%storage read --object $path_to_pickle_in_gcs --variable remote_pickle_1
df_method1 = pickle.load(BytesIO(remote_pickle_1))
print(df_method1)

# Read Alternate Method 2 - Read pickle from GCS using storage.Bucket.item().read_from()
remote_pickle_2 = sample_bucket.item('my_pickle_file.pkl').read_from()
df_method2 = pickle.load(BytesIO(remote_pickle_2))
print(df_method2)

注意:known issue如果%storage命令是单元格中的第一行,则// "bootstrap-sprockets" must be imported before "bootstrap" and "bootstrap/variables" @import "bootstrap-sprockets"; @import "bootstrap"; @import "bootstrap/theme"; 命令不起作用。在第一行放置注释或python代码。