我正在尝试从spark执行配置单元SQL。下面的命令可以使用spark-sql或从带有嵌入式hive sql语句的python的spark-submit中正常工作。
spark-sql --master yarn -e "select count(*) from adhoc.dual;"
但是我不能执行 spark-sql --master yarn -e“从adhoc.dual中选择current_user();”
我收到以下错误
Error in query: Undefined function: 'current_user'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 7
current_user是一个配置单元永久性功能。
与pyspark类似,将UDF注册为临时函数后,我可以执行大多数hive语句,包括涉及用户定义的UDF的语句: CREATE TEMPORARY FUNCTION foo AS'com.example.foo';
但是执行hive的create宏会导致失败
from pyspark.sql import SparkSession
from pyspark.sql import Row
from pyspark.sql.types import IntegerType, StringType
import pyspark.sql.functions as F
spark = SparkSession.builder.appName("Python Spark Hive example").enableHiveSupport().getOrCreate()
spark.sql("CREATE TEMPORARY MACRO PacificTzDate(ts BIGINT) DATE(FROM_UTC_TIMESTAMP(FROM_UNIXTIME(ts, 'yyyy-MM-dd HH:mm:ss'), 'America/Los_Angeles')")
此操作失败,并显示以下错误:
pyspark.sql.utils.ParseException: u"\nOperation not allowed: CREATE TEMPORARY MACRO(line 1, pos 0)\n\n== SQL ==\nCREATE TEMPORARY MACRO PacificTzDate(ts BIGINT) DATE(FROM_UTC_TIMESTAMP(FROM_UNIXTIME(ts, 'yyyy-MM-dd HH:mm:ss'), 'America/Los_Angeles')\n^^^\n"
19/01/02 20:09:49信息SparkContext:从关机钩子调用stop()