我有一张蜂巢桌。我想创建动态Spark SQL查询。在提交火花时,我正在指定规则名称。基于规则名称的查询应生成。在提交Spark时,我必须指定规则名称。例如:
sparks-submit <RuleName> IncorrectAge
它应该触发我的Scala目标代码:
select tablename, filter, condition from all_rules where rulename="IncorrectAge"
我的表:规则(输入表)
|---------------------------------------------------------------------------|
| rowkey| rule_name|rule_run_status| tablename |condition|filter |level|
|--------------------------------------------------------------------------|
| 1 |IncorrectAge| In_Progress | VDP_Vendor_List| age>18 gender=Male|NA|
|---------------------------------------------------------------------------
|2 | Customer_age| In_Progress | Customer_List | age<25 gender=Female|NA|
|----------------------------------------------------------------------------
我获取了规则名称:
select tablename, filter, condition from all_rules where rulename="IncorrectAge";
执行此查询后,我得到如下结果:
|----------------------------------------------|
|tablename | filter | condition |
|----------------------------------------------|
|VDP_Vendor_List | gender=Male | age>18 |
|----------------------------------------------|
现在我要动态进行Spark sql查询
select count(*) from VDP_Vendor_List // first column --tablename
select count(*) from VDP_Vendor_List where gender=Male --tablename and filter
select * from EMP where gender=Male AND age >18 --tablename, filter, condition
我的代码-Spark 2.2版本代码:
import org.apache.spark.sql.{ Row, SparkSession }
import org.apache.log4j._
object allrules {
def main(args: Array[String]) {
val spark = SparkSession.builder().master("local[*]")
.appName("Spark Hive")
.enableHiveSupport().getOrCreate();
import spark.implicits._
val sampleDF = spark.read.json("C:/software/sampletableCopy.json") // for testing purpose i converted hive table to json data
sampleDF.registerTempTable("sampletable")
val allrulesDF = spark.sql("SELECT * FROM sampletable")
allrulesDF.show()
val TotalCount: Long = allrulesDF.count()
println("==============> Total count ======>" + allrulesDF.count())
val df1 = allrulesDF.select(allrulesDF.col("tablename"),allrulesDF.col("condition"),allrulesDF.col("filter"),allrulesDF.col("rule_name"))
df1.show()
val df2= df1.where(df1.col("rule_name").equalTo("IncorrectAge")).show()
println(df2)
// var table_name = ""
// var condition =""
// var filter = "";
// df1.foreach(row=>{
// table_name = row.get(1).toString();
// condition = row.get(2).toString();
// filter = row.get(3).toString();
// })
}
}
答案 0 :(得分:0)
您可以将参数从spark-submit传递到您的应用程序:
bin/spark-submit --class allrules something.jar tablename filter condition
然后,在主要功能中,您将拥有参数:
def main(args: Array[String]) : Unit = {
// args(0), args(1) ... there are your params
}
答案 1 :(得分:0)
您可以像这样将参数传递给驱动程序类:
object DriverClass
{
val log = Logger.getLogger(getClass.getName)
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder().master("yarn").config("spark.sql.warehouse.dir", "path").enableHiveSupport().getOrCreate()
if (args == null || args.isEmpty || args.length != 2) {
log.error("Invalid number of arguments passed.")
log.error("Arguments Usage: <Rule Name> <Rule Type>)
log.error("Stopping the flow")
System.exit(1)
}
import spark.implicits._
val ruleName: String = String.valueOf(args(0).trim())
val ruleType: String = String.valueOf(args(1).trim())
val processSQL: String="Select tablename, filter, condition from all_rules where $ruleName=$ruleType"
val metadataDF=spark.sql(processSQL)
val (tblnm,fltr,cndtn) =metadataDF.rdd.map(f=>(f.get(0).toString(),f.get(1).toString(),f.get(2).toString())).collect()(0)
val finalSql_1="select count(*) from $tblnm" // first column
val finalSql_2="select count(*) from $tblnm" where $fltr"
val finalSql_3="select * from EMP where $fltr AND $cndtn"
spark.sql(finalSql_1).show()
spark.sql(finalSql_2).show()
spark.sql(finalSql_3).show()
}
}