大家好, 我是GCP的新手(从Aws转到GCP),我有一个la脚的问题(请原谅)。我们正在使用GCP构建传统的EDW。作为调度程序的一部分,我们有云编写器,并且所有代码都位于Compute Engine中(例如AWS中的Ec2实例)。
我如何设置工作流程以通过Compute Engine运行我的工作?或实现相同的最佳解决方案是什么?
有关我们管道的更多信息: 管道1:从sql db(legacy)中提取数百万行,执行一些etl逻辑[清理,添加新列,删除列,增加case列值等],最后加载到redshift
管道2:从Googlesheets读取数据,执行上述etl逻辑并加载到不同的redshift表中。
管道3:从Google API读取数据,执行清理,插入redshift等。
如何最好地使用Cloud Composer编写ETL工作流程。
任何帮助都非常感谢!
----------PROJECT STRUCTURE & REQUIREMENTS------------ In my compute Engine I have project like : /home/ubunutu/projects/project1 /venv /src/job1.py ( reads googlesheets and loads into cloudsql) /src/job2.py ( Reads Google Adwords API, do some cleaning, modifying attributes and load into cloudsql) /home/ubunutu/projects/project2 /venv /src/job1.py ( Read file from GCS, perform cleaning,adding/remving columns and load into cloudsql) /src/job2.py ( Reads data from a cloudsql table A and perform some modifications and loads into cloudsql table B) Now in composer, how do I orchestrate the complete work flow? Python jobs sits in Compute engine and I need to execute them. The reason Why we use compute Engine is to perform some in-memory opearions like reading data in dataframe, do some group by, create new columns, creating temporary files and so on. or what would be your suggestions? As like moving the whole sandbox to composer's /data directory as like, /data/projects/project1 /venv /src/job1.py ( reads googlesheets and loads into cloudsql) /src/job2.py ( Reads Google Adwords API, do some cleaning, modifying attributes and load into cloudsql) /data/projects/project2 /venv /src/job1.py ( Read file from GCS, perform cleaning,adding/removing columns and load into cloudsql) /src/job2.py ( Reads data from a cloudsql table A and perform some modifications and loads into cloud sql table B) In this case, 1. Will I be able to download any temporary files in composer server and perform some operations on it? 2. I shall not be needed to create venv If I place my code in composer directly as I can install packages via PyPI in console? ----------------------------------------------------------
您能帮我提供宝贵的知识吗?提前非常感谢!
非常感谢!
答案 0 :(得分:1)
这里有一种设计模式,您可以根据自己的需要进行调整。 Task scheduling on Compute Engine with Cloud Scheduler
假设您可以设置 Pub/Sub 主题和订阅,您可以...