我已经有一个按标签划分的GCloud存储桶,如下所示:
gs://my_bucket/dataset/label1/
gs://my_bucket/dataset/label2/
...
每个标签文件夹中都有照片。我想生成所需的CSV-as explained here-但考虑到每个文件夹中都有数百张照片,我不知道如何以编程方式进行操作。 CSV文件应如下所示:
gs://my_bucket/dataset/label1/photo1.jpg,label1
gs://my_bucket/dataset/label1/photo12.jpg,label1
gs://my_bucket/dataset/label2/photo7.jpg,label2
...
答案 0 :(得分:0)
您需要列出数据集文件夹内的所有文件及其完整路径,然后对其进行解析以获得包含该文件的文件夹的名称,在这种情况下,这就是您要使用的标签。这可以通过几种不同的方式来完成。我将提供两个示例,您可以基于这些示例建立代码:
Gsutil有一个method that lists bucket contents,那么您可以使用bash脚本解析该字符串:
# Create csv file and define bucket path
bucket_path="gs://buckbuckbuckbuck/dataset/"
filename="labels_csv_bash.csv"
touch $filename
IFS=$'\n' # Internal field separator variable has to be set to separate on new lines
# List of every .jpg file inside the buckets folder. ** searches for them recursively.
for i in `gsutil ls $bucket_path**.jpg`
do
# Cuts the address using the / limiter and gets the second item starting from the end.
label=$(echo $i | rev | cut -d'/' -f2 | rev)
echo "$i, $label" >> $filename
done
IFS=' ' # Reset to originnal value
gsutil cp $filename $bucket_path
也可以使用为不同语言提供的Google Cloud Client libraries来完成。这里有一个使用python的示例:
# Imports the Google Cloud client library
import os
from google.cloud import storage
# Instantiates a client
storage_client = storage.Client()
# The name for the new bucket
bucket_name = 'my_bucket'
path_in_bucket = 'dataset'
blobs = storage_client.list_blobs(bucket_name, prefix=path_in_bucket)
# Reading blobs, parsing information and creating the csv file
filename = 'labels_csv_python.csv'
with open(filename, 'w+') as f:
for blob in blobs:
if '.jpg' in blob.name:
bucket_path = 'gs://' + os.path.join(bucket_name, blob.name)
label = blob.name.split('/')[-2]
f.write(', '.join([bucket_path, label]))
f.write("\n")
# Uploading csv file to the bucket
bucket = storage_client.get_bucket(bucket_name)
destination_blob_name = os.path.join(path_in_bucket, filename)
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(filename)
答案 1 :(得分:0)
对于像我一样的人,他们正在寻找一种方法来创建.csv文件以在googleAutoML中进行批处理,但是不需要标签列:
# Create csv file and define bucket path
bucket_path="gs:YOUR_BUCKET/FOLDER"
filename="THE_FILENAME_YOU_WANT.csv"
touch $filename
IFS=$'\n' # Internal field separator variable has to be set to separate on new lines
# List of every [YOUREXTENSION] file inside the buckets folder - change in next line - ie **.png beceomes **.your_extension. ** searches for them recursively.
for i in `gsutil ls $bucket_path**.png`
do
echo "$i" >> $filename
done
IFS=' ' # Reset to originnal value
gsutil cp $filename $bucket_path