我在DAG中有4个任务,其中t1, t2 and t3
是bash_operator,而t4
是python_operator。 t1
的命令从NCBI数据库下载蛋白质结构,t2
使用该结构并运行作业并输出另一个结构。 t3
使用t2
的输出并运行另一个作业并输出一个csv文件,t4
清理并分析该csv文件。我的问题是,在t1
上下载的文件以及t2
和t3
的输出的默认位置在哪里?
当我在Airflow外部的t1
中运行命令时,文件被下载到运行该命令的目录中,但是我似乎无法从airflow找到该文件。另外,默认情况下,t2
在哪里寻找输入文件来运行其命令?我们可以更改输入文件的外观吗?
# Here are t1 & t2:
t1 = BashOperator(
task_id='get_pdb_1',
bash_command='$SCHRODINGER/utilities/getpdb -r 3hfm',
dag=dag)
# $SCHRODINGER/utilities/getpdb -r 3hfm
# SCHRODINGER is set to a software on my .bashrc and normally the above command downloads a structure 3hfm.pdb in the directory it's ran in.
t2 = BashOperator(
task_id='prepare_pdb_1',
bash_command='$SCHRODINGER/utilities/prepwizard 3hfm.pdb test1.pdb',
retries=3,
dag=dag)
# $SCHRODINGER/utilities/prepwizard 3hfm.pdb test1.pdb
# This command inputs a structure 3hfm.pdb and outputs test1.pdb in a directory it's ran in.
这里t1
成功,它说成功保存了文件,但是我找不到该文件的保存位置,并且t2
失败了,找不到输入文件3hfm.pdb
应该由t1
的命令下载。
t1
的输出
[2019-09-16 12:47:16,767] {bash_operator.py:91} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_ID=schro_run
AIRFLOW_CTX_TASK_ID=get_pdb_1
AIRFLOW_CTX_EXECUTION_DATE=2019-09-16T19:46:28.978931+00:00
AIRFLOW_CTX_DAG_RUN_ID=manual__2019-09-16T19:46:28.978931+00:00
[2019-09-16 12:47:16,768] {bash_operator.py:105} INFO - Temporary script location: /var/folders/j0/gtzmrlh13v1660j7yq3zdt6r000991/T/airflowtmp0h31csx4/get_pdb_1kj71allt
[2019-09-16 12:47:16,768] {bash_operator.py:115} INFO - Running command: $SCHRODINGER/utilities/getpdb -r 3hfm
[2019-09-16 12:47:16,777] {bash_operator.py:124} INFO - Output:
[2019-09-16 12:47:18,707] {bash_operator.py:128} INFO - Downloading 3hfm...
[2019-09-16 12:47:19,001] {bash_operator.py:128} INFO - saved data to file: 3hfm.pdb
[2019-09-16 12:47:19,084] {bash_operator.py:132} INFO - Command exited with return code 0
[2019-09-16 12:47:20,754] {logging_mixin.py:95} INFO - [[34m2019-09-16 12:47:20,754[0m] {[34mlocal_task_job.py:[0m105} INFO[0m - Task exited with return code 0[0m
t2
的输出
[2019-09-16 13:04:13,867] {bash_operator.py:91} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_ID=schro_run
AIRFLOW_CTX_TASK_ID=prepare_pdb_1
AIRFLOW_CTX_EXECUTION_DATE=2019-09-16T19:46:28.978931+00:00
AIRFLOW_CTX_DAG_RUN_ID=manual__2019-09-16T19:46:28.978931+00:00
[2019-09-16 13:04:13,868] {bash_operator.py:105} INFO - Temporary script location: /var/folders/j0/gtzmrlh13v1660j7yq3zdt6r000991/T/airflowtmpgc8jzd_v/prepare_pdb_1wrbbup5b
[2019-09-16 13:04:13,868] {bash_operator.py:115} INFO - Running command: $SCHRODINGER/utilities/prepwizard 3hfm.pdb test1.pdb
[2019-09-16 13:04:13,876] {bash_operator.py:124} INFO - Output:
[2019-09-16 13:04:15,725] {bash_operator.py:128} INFO - Usage: $SCHRODINGER/utilities/prepwizard [options] inputfile outputfile
prepwizard_startup.py: error: Error: input file not found: 3hfm.pdb
[2019-09-16 13:04:15,832] {bash_operator.py:132} INFO - Command exited with return code 2
[2019-09-16 13:04:15,839] {taskinstance.py:1051} ERROR - Bash command failed
Traceback (most recent call last):
File "/Users/chamiso/Documents/Random/randomRepos/airflow_practice/practice-airflow.env/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 926, in _run_raw_task
result = task_copy.execute(context=context)
File "/Users/chamiso/Documents/Random/randomRepos/airflow_practice/practice-airflow.env/lib/python3.7/site-packages/airflow/operators/bash_operator.py", line 136, in execute
raise AirflowException("Bash command failed")
airflow.exceptions.AirflowException: Bash command failed