EG：

public class Recursion2 {

    public static void main(String[] args) {

        System.out.println(func3(4, 2));
        
    }//main

    public static int func3(int x, int y)
    {
    if(x == y) return 1;
    if(x - y < x) return 1 + func3(x - y, y);
    } 
}//class

我的代码：

source file path: gs://logbucket/mylog/2020/07/22/log.csv

Expected Target: gs://logbucket/hivelog/2020/07/22/log.csv

输出：

您可以看到此from google.cloud import storage def hello_gcs_generic(data, context): sourcebucket=format(data['bucket']) source_file=format(data['name']) year = source_file.split("/")[1] month = source_file.split("/")[2] day = source_file.split("/")[3] filename=source_file.split("/")[4] print(year) print(month) print(day) print(filename) print(sourcebucket) print(source_file) storage_client = storage.Client() source_bucket = storage_client.bucket(sourcebucket) source_blob = source_bucket.blob(source_file) destination_bucket = storage_client.bucket(sourcebucket) destination_blob_name = 'hivelog/year='+year+'/month='+month+'/day='+day+'/'+filename blob_copy = source_bucket.copy_blob( source_blob, destination_bucket, destination_blob_name ) blob.delete() print( "Blob {} in bucket {} copied to blob {} in bucket {}.".format( source_blob.name, source_bucket.name, blob_copy.name, destination_bucket.name, ) )的来历吗？同样在其中，我有year=year=2020

这样的文件夹

我无法解决此问题。

Answer 1

有根据的猜测是，您要从中复制的源路径实际上是这种格式

foo
year=2020
month=42

所以当您用斜杠分割时，您会得到

year=

，然后在重新组合这些组件时，再次在其中添加另一个month= / destination_blob_name = 'hivelog/year='+year+'/month='+month+ ... / ...前缀

year=year=year=

就在那里； {{1}}经过3次迭代...

您还确定您还没有遍历已复制的文件吗？这也会导致这种情况。

Answer 2

您正在写入要从其中复制的存储桶：

destination_bucket = storage_client.bucket(sourcebucket)

每次将新文件添加到存储桶时，都会再次触发Cloud Function。

您要么需要使用两个不同的存储桶，要么根据路径的第一部分添加条件：

top_level_directory = source_file.split("/")[0]
if top_level_directory == "mylog":
    # Do the copying
elif top_level_directory == "hivelog":
    # This is a file created by the function, do nothing
else:
    # We weren't expecting this top level directory

Answer 3

import os
import gcsfs


def hello_gcs_generic(data, context):
    fs = gcsfs.GCSFileSystem(project="Project_Name", token=os.getenv("GOOGLE_APPLICATION_CREDENTIALS"))
    source_filepath = f"{data['bucket']}/{data['name']}"
    destination_filepath = source_filepath.replace("mylog","hivelog")
    fs.cp(source_filepath,destination_filepath)
    print(f"Blob {data['name']} in bucket {data['bucket']} copied to hivelog")

这应该使您在尝试达到的目标上先行一步。将Project_Name替换为存储桶所在的GCP项目的名称。

还假设您在JSON文件中设置了服务帐户凭据，该文件设置了环境变量GOOGLE_APPLICATION_CREDENTIALS，根据您对google.cloud存储的使用情况，我认为情况就是这样。

现在授予您可以接受“ mylog”或“ hivelog”作为参数，并使其在其他情况下有用。同样，为了分割文件名，如果您需要再次执行该操作，则只需一行即可：

_,year,month,data,filename = data['name'].split('/')

在这种情况下，下划线只是用来告诉自己和其他人您不打算使用拆分的那一部分。

您可以使用扩展解压缩来忽略多个值，例如

*_,month,day,filename = data['name'].split('/')

或者您可以将两者结合在一起

*_,month,day,_ = data['name'].split('/')

编辑：link到gcsfs文档

GCP云功能python-GCS复制文件-重复文件

EG：

我的代码：

输出：

3 个答案: