我有一个关于如何通过两个日期作为参数传递的查询 提交火花,需要代码使用。
使用了以下Spark提交,日期是表中的字符串
from pyspark.sql import sparksession
from pyspark.sql import functions as F
from pyspark import HiveContext
hiveContext= HiveContext(sc)
start_date=arg1
end_dater =arg2
def UDF_df(i):
print(i[0])
ABC2 = spark.sql(
"select * From A where day ='{0}'".format(i[0])
)
Join = ABC2.join(
Tab2,
(
ABC2.ID == Tab2.ID
)
).select(
Tab2.skey,
ABC2.Day,
ABC2.Name,
ABC2.Description
)
Join.select(
"Tab2.skey",
"ABC2.Day",
"ABC2.Name",
"ABC2.Description"
).write.mode("append").format("parquet").insertinto("Table")
ABC=spark.sql(
"select distinct day from A where day>= start_date and day<=end_date"
)
Tab2=spark.sql("select * from B where day is not null")
for in in ABC.collect():
UDF_df(i)
Above code isn't taking arg 1 & 2 and is thus resulting in an error
答案 0 :(得分:1)
如果它是python脚本,请尝试使用sys模块。
import sys
start_date=sys.argv[1]
end_date=sys.argv[2]
答案 1 :(得分:0)
尝试一下-
spark-submit \
.
.
--conf "spark.service.user.startdate=2019-03-04" \
--conf "spark.service.user.enddate=2019-03-07" \
.
.
In your code refer above config property as -
spark.sparkContext.getConf.get("spark.service.user.startdate")
spark.sparkContext.getConf.get("spark.service.user.enddate")
希望这会有所帮助。
答案 2 :(得分:0)
将参数传递给Spark脚本的最干净方法是使用args解析器模块传递命名参数。
pip安装argparse
示例代码:
parser = argparse.ArgumentParser(description='Process some integers.')
parser.add_argument('--date1', help='pass the first date')
parser.add_argument('--date2', help='pass the 2nd date')
args = parser.parse_args()
start_date=arg1.date1
end_dater =arg2.date2
有关args解析器的更多信息: https://docs.python.org/2/library/argparse.html