将文件复制到GCP存储桶时如何从Python进行云日志记录

时间:2020-09-19 09:03:02

标签: python logging google-cloud-platform cloud bucket

我编写了一个python脚本,用于将文件从本地复制到gcp存储桶并捕获日志信息。

gsutil rsync命令运行正常,文件已复制到相应的目标文件夹。

但是,日志信息未出现在gcp日志查看器中。下面给出了示例脚本。请提出建议。

## python3 /home/sant/multiprocessing_gs.py
from multiprocessing import Pool
from subprocess import Popen, PIPE, TimeoutExpired, run, CalledProcessError
import os
import sys
import logging as lg
import google.cloud.logging as gcl
from google.cloud.logging.handlers import CloudLoggingHandler

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/home/sant/key.json"
ftp_path1 = "/home/sant"
GCS_DATA_INGEST_BUCKET_URL = "dev2-ingest-manual"


class GcsMover:
    def __init__(self):
        self.folder_list = ["raw_amr", "osr_data"]
        self.logger = self.create_logger()

    def create_logger(self, log_name="Root_Logger", log_level=lg.INFO):
        try:
            log_format = lg.Formatter("%(levelname)s %(asctime)s - %(message)s")
            client = gcl.Client()
            log_handler = CloudLoggingHandler(client)
            log_handler.setFormatter(log_format)
            logger = lg.getLogger(log_name)
            logger.setLevel(log_level)
            logger.addHandler(log_handler)
            return logger
        except Exception as e:
            sys.exit("WARNING - Invalid cloud logging")

    def execute_jobs(self, cmd):
        try:
            gs_sp = Popen(cmd, stdin=PIPE, stdout=PIPE, stderr=PIPE, shell=True)
            print(f"starting process with Pid {str(gs_sp.pid)} for command {cmd}")
            self.logger.info(f"starting process with Pid {str(gs_sp.pid)} for command {cmd}")
            sp_out, sp_err = gs_sp.communicate(timeout=int(3600))
        except OSError:
            self.logger.error(f"Processing aborted for Pid {str(gs_sp.pid)}")
        except TimeoutExpired:
            gs_sp.kill()
            self.logger.error(f"Processing aborted for Pid {str(gs_sp.pid)}")
        else:
            if gs_sp.returncode:
                self.logger.error(f"Failure due to {sp_err} for Pid {str(gs_sp.pid)} and command {cmd}")
            else:
                print(f"Loading successful for Pid {str(gs_sp.pid)}")
                self.logger.info(f"Loading successful for Pid {str(gs_sp.pid)}")

    def move_files(self):
        command_list = []
        for folder in self.folder_list:
            gs_command = f"gsutil -m rsync -r {ftp_path1}/{folder} gs://{GCS_DATA_INGEST_BUCKET_URL}/{folder}"
            command_list.append(gs_command)
        pool = Pool(processes=2, maxtasksperchild=1)
        pool.map(self.execute_jobs, iterable=command_list)
        pool.close()
        pool.join()


def main():
    gsu = GcsMover()
    gsu.move_files()


if __name__ == "__main__":
    main()

2 个答案:

答案 0 :(得分:1)

documentation说明了如何通过使用存储触发器在带有Cloud Functions的GGS存储桶中记录活动。 我已经对其进行了测试,并且对我有用,我使用了与文档中提供的代码相同的代码:

def hello_gcs(event, context):
    """Background Cloud Function to be triggered by Cloud Storage.
       This generic function logs relevant data when a file is changed.

    Args:
        event (dict):  The dictionary with data specific to this type of event.
                       The `data` field contains a description of the event in
                       the Cloud Storage `object` format described here:
                       https://cloud.google.com/storage/docs/json_api/v1/objects#resource
        context (google.cloud.functions.Context): Metadata of triggering event.
    Returns:
        None; the output is written to Stackdriver Logging
    """

    print('Event ID: {}'.format(context.event_id))
    print('Event type: {}'.format(context.event_type))
    print('Bucket: {}'.format(event['bucket']))
    print('File: {}'.format(event['name']))
    print('Metageneration: {}'.format(event['metageneration']))
    print('Created: {}'.format(event['timeCreated']))
    print('Updated: {}'.format(event['updated']))

对于部署,我使用了以下命令:

gcloud functions deploy hello_gcs \
--runtime python37 \
--trigger-resource YOUR_TRIGGER_BUCKET_NAME \
--trigger-event google.storage.object.finalize

答案 1 :(得分:0)

Google云存储可以记录对对象执行的操作,如described in the documentation。您可能需要在项目中activate audit logs

由于脚本使用rsync,因此会对GCS进行一些操作(the code of the command中的详细信息),但作为概述,它将检查存储桶中是否存在对象(通过列出存储桶) ),如果存在,则会将本地文件的哈希值与远程文件的哈希值进行比较,如果文件已更改或以前不存在,则会上传该文件。

所有这些操作都将记录在数据访问日志中,您可以从the console进行访问。

如果您还希望保留本地日志(以防万一本地错误未记录在云中),则可以通过将重定向附加到日志文件来更改执行的命令:

gsutil -m rsync -r /source/path gs://bucket/folder &> /path/to/log
相关问题