XCOM不适用于PythonVirtualenvOperator airflow 1.10.6

时间:2019-12-02 15:22:08

标签: python-3.x airflow

我在气流项目中使用PythonVirtualenvOperator,我需要将参数从一个任务传递给其他任务,以测试我使用此example的xcom,它可以工作,但是当我将pythonOperator更改为PythonVirtualenvOperator时,它有一个问题。

代码:

from airflow import DAG
from airflow.operators.python_operator import PythonOperator, PythonVirtualenvOperator

args = {
    'owner': 'Airflow',
    'start_date': airflow.utils.dates.days_ago(2),
    'provide_context': True,

}

dag = DAG('BAtatas', schedule_interval="@once", default_args=args)

value_1 = [1, 2, 3]
value_2 = {'a': 'b'}


def push(**kwargs):
    """Pushes an XCom without a specific target"""
    print(kwargs)
    kwargs['ti'].xcom_push(key='value from pusher 1', value=value_1)


def push_by_returning(**kwargs):
    """Pushes an XCom without a specific target, just by returning it"""
    return value_2


def puller(**kwargs):
    """Pull all previously pushed XComs and check if the pushed values match the pulled values."""
    ti = kwargs['ti']

    # get value_1
    pulled_value_1 = ti.xcom_pull(key=None, task_ids='push')
    assert pulled_value_1 == value_1

    # get value_2
    pulled_value_2 = ti.xcom_pull(task_ids='push_by_returning')
    assert pulled_value_2 == value_2

    # get both value_1 and value_2
    pulled_value_1, pulled_value_2 = ti.xcom_pull(key=None, task_ids=['push', 'push_by_returning'])
    assert (pulled_value_1, pulled_value_2) == (value_1, value_2)


push1 = PythonVirtualenvOperator(
    task_id='push',
    dag=dag,
    python_callable=push,
    requirements=['dill'],
    python_version='3.8',
    use_dill=True,
    system_site_packages=True,
    op_args=None,
    op_kwargs=None,
)

push2 = PythonVirtualenvOperator(
    task_id='push_by_returning',
    dag=dag,
    python_callable=push_by_returning,
    requirements=['dill'],
    python_version='3.8',
    use_dill=True,
    system_site_packages=True,
    op_args=None,
    op_kwargs=None,
)

pull = PythonVirtualenvOperator(
    task_id='puller',
    dag=dag,
    python_callable=puller,
    requirements=['dill'],
    python_version='3.8',
    use_dill=True,
    system_site_packages=True,
    op_args=None,
    op_kwargs=None,
)

pull << [push1, push2]

错误:

[2019-11-29 17:58:04,745] {base_task_runner.py:113} INFO - Job 133: Subtask push Traceback (most recent call last):
[2019-11-29 17:58:04,745] {base_task_runner.py:113} INFO - Job 133: Subtask push   File "/home/yk0l0diy/.local/share/virtualenvs/load_data-KWkJBdeu/bin/airflow", line 37, in <module>
[2019-11-29 17:58:04,746] {base_task_runner.py:113} INFO - Job 133: Subtask push     args.func(args)
[2019-11-29 17:58:04,747] {base_task_runner.py:113} INFO - Job 133: Subtask push   File "/home/yk0l0diy/.local/share/virtualenvs/load_data-KWkJBdeu/lib/python3.8/site-packages/airflow/utils/cli.py", line 74, in wrapper
[2019-11-29 17:58:04,748] {base_task_runner.py:113} INFO - Job 133: Subtask push     return f(*args, **kwargs)
[2019-11-29 17:58:04,748] {base_task_runner.py:113} INFO - Job 133: Subtask push   File "/home/yk0l0diy/.local/share/virtualenvs/load_data-KWkJBdeu/lib/python3.8/site-packages/airflow/bin/cli.py", line 551, in run
[2019-11-29 17:58:04,749] {base_task_runner.py:113} INFO - Job 133: Subtask push     _run(args, dag, ti)
[2019-11-29 17:58:04,749] {base_task_runner.py:113} INFO - Job 133: Subtask push   File "/home/yk0l0diy/.local/share/virtualenvs/load_data-KWkJBdeu/lib/python3.8/site-packages/airflow/bin/cli.py", line 466, in _run
[2019-11-29 17:58:04,749] {base_task_runner.py:113} INFO - Job 133: Subtask push     ti._run_raw_task(
[2019-11-29 17:58:04,749] {base_task_runner.py:113} INFO - Job 133: Subtask push   File "/home/yk0l0diy/.local/share/virtualenvs/load_data-KWkJBdeu/lib/python3.8/site-packages/airflow/utils/db.py", line 74, in wrapper
[2019-11-29 17:58:04,749] {base_task_runner.py:113} INFO - Job 133: Subtask push     return func(*args, **kwargs)
[2019-11-29 17:58:04,749] {base_task_runner.py:113} INFO - Job 133: Subtask push   File "/home/yk0l0diy/.local/share/virtualenvs/load_data-KWkJBdeu/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 930, in _run_raw_task
[2019-11-29 17:58:04,749] {base_task_runner.py:113} INFO - Job 133: Subtask push     result = task_copy.execute(context=context)
[2019-11-29 17:58:04,749] {base_task_runner.py:113} INFO - Job 133: Subtask push   File "/home/yk0l0diy/.local/share/virtualenvs/load_data-KWkJBdeu/lib/python3.8/site-packages/airflow/operators/python_operator.py", line 113, in execute
[2019-11-29 17:58:04,749] {base_task_runner.py:113} INFO - Job 133: Subtask push     return_value = self.execute_callable()
[2019-11-29 17:58:04,749] {base_task_runner.py:113} INFO - Job 133: Subtask push   File "/home/yk0l0diy/.local/share/virtualenvs/load_data-KWkJBdeu/lib/python3.8/site-packages/airflow/operators/python_operator.py", line 297, in execute_callable
[2019-11-29 17:58:04,749] {base_task_runner.py:113} INFO - Job 133: Subtask push     self._write_args(input_filename)
[2019-11-29 17:58:04,749] {base_task_runner.py:113} INFO - Job 133: Subtask push   File "/home/yk0l0diy/.local/share/virtualenvs/load_data-KWkJBdeu/lib/python3.8/site-packages/airflow/operators/python_operator.py", line 337, in _write_args
[2019-11-29 17:58:04,749] {base_task_runner.py:113} INFO - Job 133: Subtask push     dill.dump(arg_dict, f)
[2019-11-29 17:58:04,749] {base_task_runner.py:113} INFO - Job 133: Subtask push   File "/home/yk0l0diy/.local/share/virtualenvs/load_data-KWkJBdeu/lib/python3.8/site-packages/dill/_dill.py", line 259, in dump
[2019-11-29 17:58:04,750] {base_task_runner.py:113} INFO - Job 133: Subtask push     Pickler(file, protocol, **_kwds).dump(obj)
[2019-11-29 17:58:04,750] {base_task_runner.py:113} INFO - Job 133: Subtask push   File "/home/yk0l0diy/.local/share/virtualenvs/load_data-KWkJBdeu/lib/python3.8/site-packages/dill/_dill.py", line 445, in dump
[2019-11-29 17:58:04,750] {base_task_runner.py:113} INFO - Job 133: Subtask push     StockPickler.dump(self, obj)
[2019-11-29 17:58:04,750] {base_task_runner.py:113} INFO - Job 133: Subtask push   File "/usr/lib/python3.8/pickle.py", line 485, in dump
[2019-11-29 17:58:04,750] {base_task_runner.py:113} INFO - Job 133: Subtask push     self.save(obj)
[2019-11-29 17:58:04,750] {base_task_runner.py:113} INFO - Job 133: Subtask push   File "/usr/lib/python3.8/pickle.py", line 558, in save
[2019-11-29 17:58:04,750] {base_task_runner.py:113} INFO - Job 133: Subtask push     f(self, obj)  # Call unbound method with explicit self
[2019-11-29 17:58:04,750] {base_task_runner.py:113} INFO - Job 133: Subtask push   File "/home/yk0l0diy/.local/share/virtualenvs/load_data-KWkJBdeu/lib/python3.8/site-packages/dill/_dill.py", line 912, in save_module_dict
[2019-11-29 17:58:04,750] {base_task_runner.py:113} INFO - Job 133: Subtask push     StockPickler.save_dict(pickler, obj)
[2019-11-29 17:58:04,750] {base_task_runner.py:113} INFO - Job 133: Subtask push   File "/usr/lib/python3.8/pickle.py", line 969, in save_dict
[2019-11-29 17:58:04,750] {base_task_runner.py:113} INFO - Job 133: Subtask push     self._batch_setitems(obj.items())
[2019-11-29 17:58:04,750] {base_task_runner.py:113} INFO - Job 133: Subtask push   File "/usr/lib/python3.8/pickle.py", line 995, in _batch_setitems
[2019-11-29 17:58:04,750] {base_task_runner.py:113} INFO - Job 133: Subtask push     save(v)
[2019-11-29 17:58:04,750] {base_task_runner.py:113} INFO - Job 133: Subtask push   File "/usr/lib/python3.8/pickle.py", line 558, in save
[2019-11-29 17:58:04,750] {base_task_runner.py:113} INFO - Job 133: Subtask push     f(self, obj)  # Call unbound method with explicit self
[2019-11-29 17:58:04,750] {base_task_runner.py:113} INFO - Job 133: Subtask push   File "/home/yk0l0diy/.local/share/virtualenvs/load_data-KWkJBdeu/lib/python3.8/site-packages/dill/_dill.py", line 912, in save_module_dict
[2019-11-29 17:58:04,750] {base_task_runner.py:113} INFO - Job 133: Subtask push     StockPickler.save_dict(pickler, obj)
[2019-11-29 17:58:04,750] {base_task_runner.py:113} INFO - Job 133: Subtask push   File "/usr/lib/python3.8/pickle.py", line 969, in save_dict
[2019-11-29 17:58:04,750] {base_task_runner.py:113} INFO - Job 133: Subtask push     self._batch_setitems(obj.items())
[2019-11-29 17:58:04,750] {base_task_runner.py:113} INFO - Job 133: Subtask push   File "/usr/lib/python3.8/pickle.py", line 995, in _batch_setitems
[2019-11-29 17:58:04,750] {base_task_runner.py:113} INFO - Job 133: Subtask push     save(v)
[2019-11-29 17:58:04,750] {base_task_runner.py:113} INFO - Job 133: Subtask push   File "/usr/lib/python3.8/pickle.py", line 558, in save
[2019-11-29 17:58:04,750] {base_task_runner.py:113} INFO - Job 133: Subtask push     f(self, obj)  # Call unbound method with explicit self
[2019-11-29 17:58:04,750] {base_task_runner.py:113} INFO - Job 133: Subtask push   File "/home/yk0l0diy/.local/share/virtualenvs/load_data-KWkJBdeu/lib/python3.8/site-packages/dill/_dill.py", line 912, in save_module_dict
[2019-11-29 17:58:04,750] {base_task_runner.py:113} INFO - Job 133: Subtask push     StockPickler.save_dict(pickler, obj)
[2019-11-29 17:58:04,750] {base_task_runner.py:113} INFO - Job 133: Subtask push   File "/usr/lib/python3.8/pickle.py", line 969, in save_dict
[2019-11-29 17:58:04,750] {base_task_runner.py:113} INFO - Job 133: Subtask push     self._batch_setitems(obj.items())
[2019-11-29 17:58:04,751] {base_task_runner.py:113} INFO - Job 133: Subtask push   File "/usr/lib/python3.8/pickle.py", line 995, in _batch_setitems
[2019-11-29 17:58:04,751] {base_task_runner.py:113} INFO - Job 133: Subtask push     save(v)
[2019-11-29 17:58:04,751] {base_task_runner.py:113} INFO - Job 133: Subtask push   File "/usr/lib/python3.8/pickle.py", line 576, in save
[2019-11-29 17:58:04,751] {base_task_runner.py:113} INFO - Job 133: Subtask push     rv = reduce(self.proto)
[2019-11-29 17:58:04,751] {base_task_runner.py:113} INFO - Job 133: Subtask push   File "/home/yk0l0diy/.local/share/virtualenvs/load_data-KWkJBdeu/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1194, in __getattr__
[2019-11-29 17:58:04,751] {base_task_runner.py:113} INFO - Job 133: Subtask push     self.var = Variable.get(item)
[2019-11-29 17:58:04,751] {base_task_runner.py:113} INFO - Job 133: Subtask push   File "/home/yk0l0diy/.local/share/virtualenvs/load_data-KWkJBdeu/lib/python3.8/site-packages/airflow/utils/db.py", line 74, in wrapper
[2019-11-29 17:58:04,751] {base_task_runner.py:113} INFO - Job 133: Subtask push     return func(*args, **kwargs)
[2019-11-29 17:58:04,751] {base_task_runner.py:113} INFO - Job 133: Subtask push   File "/home/yk0l0diy/.local/share/virtualenvs/load_data-KWkJBdeu/lib/python3.8/site-packages/airflow/models/variable.py", line 118, in get
[2019-11-29 17:58:04,751] {base_task_runner.py:113} INFO - Job 133: Subtask push     raise KeyError('Variable {} does not exist'.format(key))
[2019-11-29 17:58:04,751] {base_task_runner.py:113} INFO - Job 133: Subtask push KeyError: 'Variable __getstate__ does not exist'
[2019-11-29 17:58:07,380] {logging_mixin.py:112} INFO - [2019-11-29 17:58:07,379] {local_task_job.py:103} INFO - Task exited with return code 1

有什么建议吗?

我正在使用PythonVirtualenvOperator来个性化我的任务,并且在每个任务中有不同的请求/版本。

仍然无法使用python3.7,代码为:

import airflow
from airflow import DAG
from airflow.operators.python_operator import PythonOperator, PythonVirtualenvOperator

args = {
    'owner': 'Airflow',
    'start_date': airflow.utils.dates.days_ago(2),
    'provide_context': True,

}

dag = DAG('BAtatas', schedule_interval="@once", default_args=args)

value_1 = [1, 2, 3]
value_2 = {'a': 'b'}


def push(**kwargs):
    """Pushes an XCom without a specific target"""
    print(kwargs)
    kwargs['ti'].xcom_push(key='value from pusher 1', value=value_1)


def push_by_returning(**kwargs):
    """Pushes an XCom without a specific target, just by returning it"""
    return value_2


def puller(**kwargs):
    """Pull all previously pushed XComs and check if the pushed values match the pulled values."""
    ti = kwargs['ti']

    # get value_1
    pulled_value_1 = ti.xcom_pull(key=None, task_ids='push')
    assert pulled_value_1 == value_1

    # get value_2
    pulled_value_2 = ti.xcom_pull(task_ids='push_by_returning')
    assert pulled_value_2 == value_2

    # get both value_1 and value_2
    pulled_value_1, pulled_value_2 = ti.xcom_pull(key=None, task_ids=['push', 'push_by_returning'])
    assert (pulled_value_1, pulled_value_2) == (value_1, value_2)


push1 = PythonVirtualenvOperator(
    task_id='push',
    dag=dag,
    python_callable=push,
    requirements=[],
    python_version='3.7',
    use_dill=False,
    system_site_packages=True,
    op_args=None,
    op_kwargs=None,
)

push2 = PythonVirtualenvOperator(
    task_id='push_by_returning',
    dag=dag,
    python_callable=push_by_returning,
    requirements=[],
    python_version='3.7',
    use_dill=False,
    system_site_packages=True,
    op_args=None,
    op_kwargs=None,
)

pull = PythonVirtualenvOperator(
    task_id='puller',
    dag=dag,
    python_callable=puller,
    requirements=[],
    python_version='3.7',
    use_dill=False,
    system_site_packages=True,
    op_args=None,
    op_kwargs=None,
)

pull << [push1, push2]

错误是:

[2019-12-04 14:08:59,579] {base_task_runner.py:113} INFO - Job 139: Subtask push     pickle.dump(arg_dict, f)
[2019-12-04 14:08:59,579] {base_task_runner.py:113} INFO - Job 139: Subtask push TypeError: cannot pickle 'module' object

当我使用use_dill=True,

错误是:

[2019-12-04 14:32:12,500] {base_task_runner.py:113} INFO - Job 141: Subtask push     return func(*args, **kwargs)
[2019-12-04 14:32:12,500] {base_task_runner.py:113} INFO - Job 141: Subtask push   File "/home/yk0l0diy/.local/share/virtualenvs/load_data-KWkJBdeu/lib/python3.8/site-packages/airflow/models/variable.py", line 118, in get
[2019-12-04 14:32:12,500] {base_task_runner.py:113} INFO - Job 141: Subtask push     raise KeyError('Variable {} does not exist'.format(key))
[2019-12-04 14:32:12,500] {base_task_runner.py:113} INFO - Job 141: Subtask push KeyError: 'Variable __getstate__ does not exist'
[2019-12-04 14:32:16,640] {logging_mixin.py:112} INFO - [2019-12-04 14:32:16,639] {local_task_job.py:103} INFO - Task exited with return code 1

2 个答案:

答案 0 :(得分:1)

这在气流 2.1.0 中已修复。这个问题在这里: https://github.com/apache/airflow/issues/15335

答案 1 :(得分:0)

问题在于调用虚拟环境时的序列化过程。该上下文包含泡菜/莳萝无法序列化的内容,因此在最新版本的Airflow中不再提供该上下文。

这意味着您无法在PythonVirtualenvOperator创建的虚拟环境中执行某些操作时,从上下文中检索内容或将其推送到上下文中。

从上下文传递虚拟环境的唯一方法是在呈现期间使用templates_dict或op_kwargs处理XCom,而不是尝试从虚拟环境内部访问上下文。

pull = PythonVirtualenvOperator(
    task_id='puller',
    op_kwargs={
        "result_push": "{{ ti.xcom_pull(task_ids='pusher') }}"
    },
)

从PythonVirtualenvOperator向Xcom发送内容的唯一方法是在虚拟环境中执行代码时使用标准输出(免责声明:我根本不建议这样做。我不知道是否甚至应该这样说:

pusher = PythonVirtualenvOperator(
    task_id='pusher',
    do_xcom_push=True,
    ...
)

PythonVirtualenvOperator将尝试执行return self._read_result(output_filename),该脚本的输出将存储在XCom中。

def push(**kwargs):
    print("test") # this value should appear inside the Xcom of that task if nothing else has printed something else and then you'll get a bunch of nothing.