如何在pyspark函数中使用全局变量

时间:2018-06-23 17:59:41

标签: python-3.x apache-spark pyspark

首先,在代码开始处有两个变量。

numericColumnNames = []
categoricalColumnsNames = [];

然后在main方法中,我为这些值分配值

def main():
  #clickRDD = sc.textFile("s3a://wer-display-ads/day_0_1000.csv"); 
  clickRDD = sc.textFile("data/day_0_1000.csv");
  numericColumnNames , categoricalColumnsNames = getColumnStructure();

然后,当我要在以下函数中使用这些变量时,这些变量不会更新且为空

def dataToVectorForLinear(clickDF):
  print (categoricalColumnsNames) ## why this list is empty 
  clickDF = oneHotEncoding(clickDF,categoricalColumnsNames)

很遗憾,我找不到问题吗?感谢您的帮助

1 个答案:

答案 0 :(得分:-1)

只需在函数'global`关键字中像这样重新初始化它们

def main():

    global numericColumnNames
    global categoricalColumnsNames     

    clickRDD = sc.textFile("data/day_0_1000.csv");
    numericColumnNames , categoricalColumnsNames = getColumnStructure();

类似地

def dataToVectorForLinear(clickDF):

    global categoricalColumnsNames
    print (categoricalColumnsNames) 
    clickDF = oneHotEncoding(clickDF,categoricalColumnsNames)

参考: