对于我的工作流程,我需要使用spark2运行作业。 我没有找到SparkSubmitOperator的任何示例或好的文档,但是无论如何都尝试与
一起使用它spark_submit = SparkSubmitOperator(
task_id='task_id',
application=string_with_path_to_jar_file,
conf={
'spark.sql.warehouse.dir': 'file:/tmp/',
'spark.hadoop.fs.permissions.umask-mode': '002',
'spark.serializer': 'org.apache.spark.serializer.KryoSerializer',
'spark.network.timeout': '360s',
'spark.yarn.executor.memoryOverhead': '5g',
'spark.dynamicAllocation.maxExecutors': '100'
},
env_vars={
'master': 'yarn',
'deploy-mode': 'client'
},
java_class=some_java_class,
executor_memory='12G',
driver_memory='3G',
num_executors=50,
application_args={'app.properties'})
我工作时会收到以下警告:
.local/lib/python2.7/site-packages/airflow/models.py:2160: PendingDeprecationWarning: Invalid arguments were passed to SparkSubmitOperator. Support for passing such arguments will be dropped in Airflow 2.0. Invalid arguments were:
[2018-07-09 18:01:53,947] {base_task_runner.py:98} INFO - Subtask: *args: ()
[2018-07-09 18:01:53,947] {base_task_runner.py:98} INFO - Subtask: **kwargs: {'env_vars': {'deploy-mode': 'client', 'master': 'yarn'}}
[2018-07-09 18:01:53,947] {base_task_runner.py:98} INFO - Subtask: category=PendingDeprecationWarning
现在我的问题是:
我可能以错误的方式使用了SparkSubmitOperator,是否有使用它的良好示例/文档,或者有人知道我在做错什么?
答案 0 :(得分:0)
SparkSubmitOperator的代码段在这里:
SparkSubmitOperator(
task_id='Extraction',
application='../scala-2.11/ssot_2.11-0.1.jar',
conn_id='spark_default',
driver_class_path='../mysql-connector-java/jars/mysql-connector-java-8.0.17.jar',
jars='../mysql-connector-java/jars/mysql-connector-java-8.0.17.jar',
dag=dag)
还有spark_default有关气流连接的详细信息。