我已经遇到了几种在Kubernetes吊舱上安排气流任务的方法,但我无法弄清楚有什么区别,以及何时应该首选一种样式而不是另一种样式。
对于上下文,我的本地气流测试实例被配置为使用KubernetesExecutor
,并且我将这些任务安排在本地Kubernetes集群上。
dag = DAG('ex1', default_args=default_args, schedule_interval=None)
# Single Operator DAG
BashOperator(
task_id='print_date',
bash_command='date',
dag=dag)
dag = DAG('ex2', default_args=default_args, schedule_interval=None)
# Why do you need to specify the executor when the executor is already configured via airflow.cfg?
BashOperator(
task_id='print_date',
bash_command='date',
dag=dag,
executor_config={"KubernetesExecutor": {"image": "ubuntu:1604"}})
KubernetesPodOperator
的第三种样式似乎最灵活(您可以使用ANY参数指定ANY容器),所以也许这是唯一的优势?但是,考虑到我只是调用bash脚本或python脚本的情况,此方法与方法1或2(使用BashOperator或PythonOperator)之间是否有区别?dag = DAG('ex3', default_args=default_args, schedule_interval=None)
KubernetesPodOperator(namespace='default',
image="ubuntu:1604",
cmds=["/bin/bash","-c"],
arguments=["echo hello world"],
labels={"foo": "bar"},
name="EchoInAUbuntuContainer",
task_id="testUbuntuEcho",
get_logs=True,
dag=dag)
答案 0 :(得分:0)
df <- data.frame(
"Code" = c("A","A","A","A","A","B","B","B","B","B"),
"Time" = c("2016-2018","Jan-Feb 2019","Mar-Apr 2019","May-Jun 2019","Jul-Aug 2019", "2016-2018","Jan-Feb 2019","Mar-Apr 2019","May-Jun 2019","Jul-Aug 2019"),
"Rate" = c(40.8, 50.8, 15.3, 39, 40.1, 70.2, 38.3, 25.2, 46.7, 41.9),
stringsAsFactors = FALSE
)
df$Time <- factor(df$Time, levels = c("2016-2018", "Jan-Feb 2019", "Mar-Apr 2019",
"May-Jun 2019", "Jul-Aug 2019"))
#Create filter
filter <- SharedData$new(df)
filter_select("Code", "Select Provider", filter, ~Code, multiple = FALSE)
#Graph
filter %>%
plot_ly(
x = ~`Time`,
y = ~`Rate`,
color = ~`Code`,
type = "scatter",
mode="lines+markers"
)
适用于您使用非Kubernetes执行器,并且您想在Kubernetes上运行映像的情况。KubernetesPodOperator
,则KubernetesExecutor
和BashOperator
之类的运算符将在Kubernetes上进行调度(尽管如果您不指定它们,则不清楚它们将使用什么图像) 。