如何修复将文件上传到Google Cloud Storage时的内存泄漏?

时间:2019-06-27 05:37:08

标签: python memory-leaks google-cloud-platform google-cloud-storage google-cloud-python

通过memory_profiler检测到内存泄漏。由于将从128MB GCF f1-micro GCE 上传如此大的文件,我该如何防止这种内存泄漏?

✗ python -m memory_profiler tests/test_gcp_storage.py
67108864

Filename: tests/test_gcp_storage.py

Line #    Mem usage    Increment   Line Contents
================================================
    48   35.586 MiB   35.586 MiB   @profile
    49                             def test_upload_big_file():
    50   35.586 MiB    0.000 MiB     from google.cloud import storage
    51   35.609 MiB    0.023 MiB     client = storage.Client()
    52                             
    53   35.609 MiB    0.000 MiB     m_bytes = 64
    54   35.609 MiB    0.000 MiB     filename = int(datetime.utcnow().timestamp())
    55   35.609 MiB    0.000 MiB     blob_name = f'test/{filename}'
    56   35.609 MiB    0.000 MiB     bucket_name = 'my_bucket'
    57   38.613 MiB    3.004 MiB     bucket = client.get_bucket(bucket_name)
    58                             
    59   38.613 MiB    0.000 MiB     with open(f'/tmp/{filename}', 'wb+') as file_obj:
    60   38.613 MiB    0.000 MiB       file_obj.seek(m_bytes * 1024 * 1024 - 1)
    61   38.613 MiB    0.000 MiB       file_obj.write(b'\0')
    62   38.613 MiB    0.000 MiB       file_obj.seek(0)
    63                             
    64   38.613 MiB    0.000 MiB       blob = bucket.blob(blob_name)
    65  102.707 MiB   64.094 MiB       blob.upload_from_file(file_obj)
    66                             
    67  102.715 MiB    0.008 MiB     blob = bucket.get_blob(blob_name)
    68  102.719 MiB    0.004 MiB     print(blob.size)

此外,如果未使用二进制模式打开文件,则内存泄漏将是文件大小的两倍

67108864
Filename: tests/test_gcp_storage.py

Line #    Mem usage    Increment   Line Contents
================================================
    48   35.410 MiB   35.410 MiB   @profile
    49                             def test_upload_big_file():
    50   35.410 MiB    0.000 MiB     from google.cloud import storage
    51   35.441 MiB    0.031 MiB     client = storage.Client()
    52                             
    53   35.441 MiB    0.000 MiB     m_bytes = 64
    54   35.441 MiB    0.000 MiB     filename = int(datetime.utcnow().timestamp())
    55   35.441 MiB    0.000 MiB     blob_name = f'test/{filename}'
    56   35.441 MiB    0.000 MiB     bucket_name = 'my_bucket'
    57   38.512 MiB    3.070 MiB     bucket = client.get_bucket(bucket_name)
    58                             
    59   38.512 MiB    0.000 MiB     with open(f'/tmp/{filename}', 'w+') as file_obj:
    60   38.512 MiB    0.000 MiB       file_obj.seek(m_bytes * 1024 * 1024 - 1)
    61   38.512 MiB    0.000 MiB       file_obj.write('\0')
    62   38.512 MiB    0.000 MiB       file_obj.seek(0)
    63                             
    64   38.512 MiB    0.000 MiB       blob = bucket.blob(blob_name)
    65  152.250 MiB  113.738 MiB       blob.upload_from_file(file_obj)
    66                             
    67  152.699 MiB    0.449 MiB     blob = bucket.get_blob(blob_name)
    68  152.703 MiB    0.004 MiB     print(blob.size)

GIST:https://gist.github.com/northtree/8b560a6b552a975640ec406c9f701731

1 个答案:

答案 0 :(得分:0)

要限制上传期间使用的内存量,您需要在调用upload_from_file()之前在blob上显式配置块大小:

blob = bucket.blob(blob_name, chunk_size=10*1024*1024)
blob.upload_from_file(file_obj)

我同意这是Google客户端SDK的默认默认行为,并且该解决方法的记录也很差。