Question

尝试通过气流使用jdbchook和jinja模板运行配置单元sql。模板适用于单个sql语句，但会引发多个语句的解析错误。

DAG

p1 = JdbcOperator( 
task_id=DAG_NAME+'_create',
jdbc_conn_id='big_data_hive',
sql='/mysql_template.sql',
params={'env': ENVIRON},
autocommit=True,
dag=dag)

模板

create table {{params.env}}_fct.hive_test_templated
(cookie_id string
,sesn_id string
,load_dt string)
;

INSERT INTO {{params.env}}_fct.hive_test_templated
select* from {{params.env}}_fct.hive_test
;

错误：org.apache.hive.service.cli.HiveSQLException：编译语句时出错：FAILED：ParseException 7：0在';'处缺少EOF靠近'）'

当我在Hue中运行时，模板查询工作正常。

Answer 1

tobi是正确的，最简单的方法是将SQL语句解析为SQL列表并按顺序执行。

我这样做的方法是使用sqlparse python库将字符串拆分为SQL语句列表，然后将它们传递给钩子（继承dbapi钩子） - dbapi基类接受SQL语句列表并且顺序执行，这也可以很容易地在hive钩子中实现。在以下示例中，我的＆＃34; CustomSnoqflakeHook＆＃34;继承自dbapi钩子，dbapi钩子中的run方法接受SQL语句列表：

    hook = hooks.CustomSnowflakeHook(snowflake_conn_id=self.snowflake_conn_id)
    sql = sqlparse.split(sqlparse.format(self.sql, strip_comments=True))
    hook.run(
        sql,
        autocommit=self.autocommit,
        parameters=self.parameters)

来自dbapi hook：

def run(self, sql, autocommit=False, parameters=None):
        """
        Runs a command or a list of commands. Pass a list of sql
        statements to the sql parameter to get them to execute
        sequentially
        :param sql: the sql statement to be executed (str) or a list of
            sql statements to execute
        :type sql: str or list
        :param autocommit: What to set the connection's autocommit setting to
            before executing the query.
        :type autocommit: bool
        :param parameters: The parameters to render the SQL query with.
        :type parameters: mapping or iterable
        """
        if isinstance(sql, basestring):
            sql = [sql]

        with closing(self.get_conn()) as conn:
            if self.supports_autocommit:
                self.set_autocommit(conn, autocommit)

            with closing(conn.cursor()) as cur:
                for s in sql:
                    if sys.version_info[0] < 3:
                        s = s.encode('utf-8')
                    self.log.info(s)
                    if parameters is not None:
                        cur.execute(s, parameters)
                    else:
                        cur.execute(s)

            if not getattr(conn, 'autocommit', False):
                conn.commit()

Answer 2

在我看来，Hue以不同的方式解析声明。有时会实现声明分隔符，允许这种情况发生。

气流似乎没有那些分离器。

因此，最简单的方法是将两个语句分开并在两个单独的任务中执行这些语句。

如何使用jdbc hook在airflow jinja模板中运行多个sql语句

2 个答案: