我正在尝试在Azure数据砖中编写一个函数。我想在函数内部使用spark.sql。但是看来我无法在辅助节点上使用它。
def SEL_ID(value, index):
# some processing on value here
ans = spark.sql("SELECT id FROM table WHERE bin = index")
return ans
spark.udf.register("SEL_ID", SEL_ID)
我收到以下错误:
PicklingError: Could not serialize object: Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063.
有什么办法可以克服这个问题?我正在使用上面的功能从另一个表中进行选择。