我想在Airflow上运行一个自动执行的DataFlow jar。 当我运行以下命令时,我得到异常:
"airflow test test-dag hello-dag 2018-03-26"
我失去了什么?我找不到更多关于此的信息。 非常感谢你的帮助。
某些版本: python 2.7.10 气流1.9.0 大熊猫0.22.0
例外:
Traceback (most recent call last):
File "/Users/henry/Documents/workspace/py27venv/bin/airflow", line 27, in <module>
args.func(args)
File "/Users/henry/Documents/workspace/py27venv/lib/python2.7/site-packages/airflow/bin/cli.py", line 528, in test
ti.run(ignore_task_deps=True, ignore_ti_state=True, test_mode=True)
File "/Users/henry/Documents/workspace/py27venv/lib/python2.7/site-packages/airflow/utils/db.py", line 50, in wrapper
result = func(*args, **kwargs)
File "/Users/henry/Documents/workspace/py27venv/lib/python2.7/site-packages/airflow/models.py", line 1584, in run
session=session)
File "/Users/henry/Documents/workspace/py27venv/lib/python2.7/site-packages/airflow/utils/db.py", line 50, in wrapper
result = func(*args, **kwargs)
File "/Users/henry/Documents/workspace/py27venv/lib/python2.7/site-packages/airflow/models.py", line 1493, in _run_raw_task
result = task_copy.execute(context=context)
File "/Users/henry/Documents/workspace/py27venv/lib/python2.7/site-packages/airflow/contrib/operators/dataflow_operator.py", line 121, in execute
hook.start_java_dataflow(self.task_id, dataflow_options, self.jar)
File "/Users/henry/Documents/workspace/py27venv/lib/python2.7/site-packages/airflow/contrib/hooks/gcp_dataflow_hook.py", line 149, in start_java_dataflow
task_id, variables, dataflow, name, ["java", "-jar"])
File "/Users/henry/Documents/workspace/py27venv/lib/python2.7/site-packages/airflow/contrib/hooks/gcp_dataflow_hook.py", line 143, in _start_dataflow
self.get_conn(), variables['project'], name).wait_for_done()
File "/Users/henry/Documents/workspace/py27venv/lib/python2.7/site-packages/airflow/contrib/hooks/gcp_dataflow_hook.py", line 31, in __init__
self._job = self._get_job()
File "/Users/henry/Documents/workspace/py27venv/lib/python2.7/site-packages/airflow/contrib/hooks/gcp_dataflow_hook.py", line 49, in _get_job
if 'currentState' in job:
TypeError: argument of type 'NoneType' is not iterable
代码:
from datetime import timedelta, datetime
import json
from airflow import DAG
from airflow.contrib.operators.dataflow_operator import DataFlowJavaOperator
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2018, 3, 26),
'email': ['airflow@airflow.com'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 2,
'retry_delay': timedelta(minutes=5),
'dataflow_default_options': {
'project': 'test-123456',
'zone': 'europe-west1-b',
'stagingLocation': 'gs://hellodag/temp'
}
}
my_dag = DAG('test-dag', default_args=default_args, schedule_interval=timedelta(1))
task_3 = DataFlowJavaOperator(
jar = '/Users/henry/Documents/workspace/helloairflow/target/helloairflow-0.0.1-SNAPSHOT.jar',
options = {
'autoscalingAlgorithm': 'BASIC',
'maxNumWorkers': '50',
'start': '{{ds}}',
'partitionType': 'DAY',
},
gcp_conn_id = 'gcp_service_account',
task_id = 'hello-dag',
dag=my_dag)
答案 0 :(得分:0)
答案是......
根据task_3中的选项设置,Airflow将在命令
下执行java -jar helloairflow-0.0.1-SNAPSHOT.jar --autoscalingAlgorithm=BASIC --maxNumWorkers=50 --start= --partitionType=DAY ...
但是我没有在helloairflow-0.0.1-SNAPSHOT.jar的主函数中定义属性“start”和“partitionType”,然后我在另一个终端执行了上面的命令,我得到了以下异常。
java.lang.IllegalArgumentException: Class interface com.henry.cloud.dataflow.connector.MariaDBConnector$MariaDBConnOptions missing a property named 'start'.
at org.apache.beam.sdk.options.PipelineOptionsFactory.parseObjects(PipelineOptionsFactory.java:1579)
at org.apache.beam.sdk.options.PipelineOptionsFactory.access$400(PipelineOptionsFactory.java:104)
at org.apache.beam.sdk.options.PipelineOptionsFactory$Builder.as(PipelineOptionsFactory.java:291)
at com.henry.cloud.dataflow.connector.MariaDBConnector.main(MariaDBConnector.java:90)
最后我删除了task_3选项中的两个属性,它运行良好。