Question

我正在尝试从Google Cloud Storage中将一个腌制的熊猫数据帧加载到App Engine中。

我一直在使用blob.download_to_file（）将字节流读入熊猫，但是遇到以下错误： UnpicklingError: invalid load key, m 我尝试从一开始就无济于事，并且可以肯定的是，我的理解缺少一些根本的东西。

当尝试传递一个打开的文件对象并从那里读取时，我得到一个 UnsupportedOperation: write 错误

from io import BytesIO
from google.cloud import storage

def get_byte_fileobj(project, bucket, path) -> BytesIO:
    blob = _get_blob(bucket, path, project)
    byte_stream = BytesIO()
    blob.download_to_file(byte_stream)
    byte_stream.seek(0)
    return(byte_stream)

def _get_blob(bucket_name, path, project):
    credentials = service_account.Credentials.from_service_account_file(
        service_account_credentials_path) if service_account_credentials_path else None
    storage_client = storage.Client(project=project, credentials=credentials)
    bucket = storage_client.get_bucket(bucket_name)
    blob = bucket.blob(path)
    return(blob)

fileobj = get_byte_fileobj(projectid, 'backups', 'Matches/Matches.pickle')
pd.read_pickle(fileobj)

理想情况下，熊猫会从泡菜中读取，因为我所有的GCS备份都采用这种格式，但是我愿意接受建议。

Answer 1

pandas.read_pickle()方法将文件路径字符串而不是文件处理程序/对象作为参数：

pandas.read_pickle(path, compression='infer') 
   Load pickled pandas object (or any object) from file.

path : str 
   File path where the pickled object will be loaded.

如果您使用的是第二代标准或灵活的环境，则可以尝试使用实际的/tmp文件而不是BytesIO。

否则，您将不得不找出另一种将数据加载到熊猫的方法，该方法支持文件对象/描述符。通常，该方法在How to restore Tensorflow model from Google bucket without writing to filesystem?中进行了描述（上下文不同，但总体思路相同）

如何将腌制的数据帧从GCS加载到App Engine中

1 个答案: