我正在尝试在Oozie工作流程中执行火花操作。它运行良好,直到我尝试通过spark访问配置单元外部表。我的工作流程:
<action name="SparkJob">
<spark xmlns="uri:oozie:spark-action:0.1">
<job-tracker>${jobtracker}</job-tracker>
<name-node>${namenode}</name-node>
<configuration>
<property>
<name>mapred.compress.map.output</name>
<value>true</value>
</property>
</configuration>
<master>yarn-cluster</master>
<name>name</name>
<class>classname</class>
<jar>jar file</jar>
<spark-opts>--conf spark.yarn.queue=${queuename_nonp} --conf spark.ui.port=5050</spark-opts>
</spark>
<ok to="EMAIL_SUCCESS"/>
<error to="EMAIL_FAILURE"/>
</action>
一旦Oozie作业失败,我将使用Oozie输出中给出的应用程序ID检查纱线日志。这里提到的错误是-
2019-06-20 17:14:56,602 [Driver] ERROR org.apache.spark.deploy.yarn.ApplicationMaster - User class threw exception: java.lang.RuntimeException: [1.1] failure: ``with'' expected but identifier use found
use instance_name
^
java.lang.RuntimeException: [1.1] failure: ``with'' expected but identifier use found
use instance_name.
^
此错误所引用的代码在spark查询下面。我在工作流中将这些火花查询用作jar。
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.functions.{from_unixtime, regexp_replace}
import org.apache.spark.sql.DataFrameNaFunctions
object loadHiveTable {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("spark-transformation-001")
val sc = new SparkContext(conf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val df = sqlContext.read.format("com.databricks.spark.avro").load("avro_file")
val temptable = "temp_emp"
df4.registerTempTable(temptable)
val tempquery = "select * from temp_emp"
val result = sqlContext.sql(tempquery)
result.show()
val setQuery = "use instance_name"
sqlContext.sql(setQuery)
val queryCreateTable = "CREATE EXTERNAL TABLE IF NOT EXISTS EMPLOYEES_Spark(\n EMPLOYEE_ID INT,\n FIRST_NAME STRING,\n LAST_NAME STRING,\n EMAIL STRING,\n PHONE_NUMBER STRING,\n HIRE_DATE DATE,\n JOB_ID STRING,\n SALARY DECIMAL(8,2),\n COMMISSION_PCT DECIMAL(2,2),\n MANAGER_ID INT\n )\n PARTITIONED BY (DEPARTMENT_ID INT)\n LOCATION 'path'\n tblproperties (\"`skip.header.line.count`\"=\"1\")"
sqlContext.sql(queryCreateTable)
sqlContext.setConf("hive.exec.dynamic.partition","true")
sqlContext.setConf("hive.exec.dynamic.partition.mode","nonstrict")
val insertQuery = "insert overwrite table employees_spark partition (department_id) select * from temp_emp"
sqlContext.sql(insertQuery)
}
}
我想知道在Oozie中以jar文件运行时这是怎么回事。
我正在spark外壳中运行整个spark查询,并获得正确的结果,直到最后一步。我可以查看在配置单元外部表中输入的数据。
答案 0 :(得分:0)
您需要使用HiveContext而不是SQLContext。