在GCS中列出对象还会列出目录

时间:2019-05-16 20:42:38

标签: python-3.x google-cloud-storage

我正在尝试从Google存储桶中打印对象(文件)列表,但结果中还包含子目录;温度/。我该如何忽略呢? Google API文档没有表明这应该发生。

我的存储桶:

gs://my_bucket/temp

我的代码:

from google.cloud import storage

storage_client = storage.Client()
bucket = storage_client.get_bucket(my_bucket)
blobs = bucket.list_blobs(prefix="temp/", delimiter='/')

for blob in blobs:
    print(blob.name)

结果:

temp/
temp/2019-02-01_file1.csv
temp/2019-02-01_file2.csv
temp/2019-02-01_file3.csv
temp/2019-02-01_file4.csv

2 个答案:

答案 0 :(得分:0)

您可能想在下面尝试以下命令: 我刚刚从GCS documentation

编辑了示例脚本
import argparse
import datetime
import pprint

# [START storage_upload_file]
from google.cloud import storage

def list_blobs_with_prefix(bucket_name, prefix, delimiter=None):
   """Lists all the blobs in the bucket that begin with the prefix.
   This can be used to list all blobs in a "folder", e.g. "public/".
   The delimiter argument can be used to restrict the results to only the
   "files" in the given "folder". Without the delimiter, the entire tree under
   the prefix is returned. For example, given these blobs:
       /a/1.txt
       /a/b/2.txt
   If you just specify prefix = '/a', you'll get back:
       /a/1.txt
       /a/b/2.txt
   However, if you specify prefix='/a' and delimiter='/', you'll get back:
       /a/1.txt
   """
   storage_client = storage.Client()
   bucket = storage_client.get_bucket(bucket_name)

   blobs = bucket.list_blobs(prefix=prefix, delimiter=delimiter)

   print('Blobs:')
   for blob in blobs:
       print(blob.name.replace(prefix, ""))

   if delimiter:
       print('Prefixes:')
       for prefix in blobs.prefixes:
           print(prefix)

if __name__ == '__main__':
   list_blobs_with_prefix('[your bucket name]', [prefix]')

答案 1 :(得分:0)

我认为这种方法是您想要的:

如果要避免处理“子文件夹” blob,最快的方法是在迭代blob时直接忽略“子文件夹” blob。

这是我提供的一些细微调整的代码。另外,如果您不希望在列出它们时显示“ temp /”,我会使用类似于Russel H回答的“替换”方法。

from google.cloud import storage

my_prefix = "temp/"
my_bucket = "my_bucket_name"
storage_client = storage.Client()
bucket = storage_client.get_bucket(my_bucket)
blobs = bucket.list_blobs(prefix = my_prefix, delimiter = '/')

for blob in blobs:
    if(blob.name != my_prefix): # ignoring the subfolder itself 
        print(" Displaying " + blob.name.replace(my_prefix, "")) # if you only want to display the name of the blob