我正在尝试在Airflow 1.9上运行hive_operator。
代码是:
import airflow
from airflow.operators.hive_operator import HiveOperator
from airflow.hooks.hive_hooks import HiveCliHook
from airflow.models import DAG
from datetime import timedelta
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': airflow.utils.dates.days_ago(2),
'email': ['support@mail.com'],
'email_on_failure': True,
'retries': 2,
'retry_delay': timedelta(seconds=30),
'catchup': False,
}
HiveCli_hook = HiveCliHook(hive_cli_conn_id='hive_cli_default')
hql = 'INSERT INTO test.test_table SELECT DISTINCT id FROM
test.tabl_test;'
dag = DAG(
dag_id='Hive_in_action',
default_args=default_args,
schedule_interval='0 0 * * *',
dagrun_timeout=timedelta(minutes=60))
create_test_table = HiveOperator(
task_id="create_test_table",
hql=hql,
hive_cli_conn_id=HiveCli_hook,
dag=dag
)
我使用隧道,这就是localhost
的原因我收到错误:
错误 - 'HiveCliHook'对象没有属性'upper'
记录的最大部分:
[2018-04-09 16:40:14,672] {models.py:1428} INFO - Executing Task(HiveOperator): create_test_table> on 2018-04-09 14:39:08
[2018-04-09 16:40:14,672] {base_task_runner.py:115} INFO - Running: ['bash', '-c', 'airflow run Hive_in_action create_test_table 2018-04-09T14:39:08 --job_id 19 --raw -sd DAGS_FOLDER/Hive_in_action.py']
[2018-04-09 16:40:15,283] {base_task_runner.py:98} INFO - Subtask: [2018-04-09 16:40:15,282] {__init__.py:45} INFO - Using executor SequentialExecutor
[2018-04-09 16:40:15,361] {base_task_runner.py:98} INFO - Subtask: [2018-04-09 16:40:15,360] {models.py:189} INFO - Filling up the DagBag from /Users/mypc/airflow/dags/Hive_in_action.py
[2018-04-09 16:40:15,387] {base_task_runner.py:98} INFO - Subtask: [2018-04-09 16:40:15,387] {base_hook.py:80} INFO - Using connection to: localhost
[2018-04-09 16:40:15,400] {cli.py:374} INFO - Running on host MyPC.local
[2018-04-09 16:40:15,413] {base_task_runner.py:98} INFO - Subtask: [2018-04-09 16:40:15,412] {hive_operator.py:96} INFO - Executing: INSERT INTO test.test_table SELECT DISTINCT id FROM test.tabl_test;
[2018-04-09 16:40:15,412] {models.py:1595} ERROR - 'HiveCliHook' object has no attribute 'upper'
Traceback (most recent call last):
File "/Users/mypc/anaconda/lib/python3.6/site-packages/airflow/models.py", line 1493, in _run_raw_task
result = task_copy.execute(context=context)
File "/Users/mypc/anaconda/lib/python3.6/site-packages/airflow/operators/hive_operator.py", line 97, in execute
self.hook = self.get_hook()
File "/Users/mypc/anaconda/lib/python3.6/site-packages/airflow/operators/hive_operator.py", line 86, in get_hook
mapred_job_name=self.mapred_job_name)
File "/Users/mypc/anaconda/lib/python3.6/site-packages/airflow/hooks/hive_hooks.py", line 71, in __init__
conn = self.get_connection(hive_cli_conn_id)
File "/Users/mypc/anaconda/lib/python3.6/site-packages/airflow/hooks/base_hook.py", line 77, in get_connection
conn = random.choice(cls.get_connections(conn_id))
File "/Users/mypc/anaconda/lib/python3.6/site-packages/airflow/hooks/base_hook.py", line 68, in get_connections
conn = cls._get_connection_from_env(conn_id)
File "/Users/mypc/anaconda/lib/python3.6/site-packages/airflow/hooks/base_hook.py", line 60, in _get_connection_from_env
environment_uri = os.environ.get(CONN_ENV_PREFIX + conn_id.upper())
AttributeError: 'HiveCliHook' object has no attribute 'upper'
[2018-04-09 16:40:15,416] {models.py:1622} INFO - All retries failed; marking task as FAILED
答案 0 :(得分:0)
您不应指定与该类名称相同的变量或对象:
HiveCliHook = HiveCliHook(...)
而是使用其他名称:
myHook = HiveCliHook(...)
create_test_table = HiveOperator(
...
hive_cli_conn_id=myHook,
...)
答案 1 :(得分:0)
好像您正在将HiveCliHook对象作为http_conn_id传递。我对HiveOperator进行了映像,并使用upper()函数将期望的字符串转换为大写,因此,行hive_cli_conn_id=HiveCli_hook,
导致了该错误。