PySpark中的数据框未显示

时间:2019-04-18 22:24:38

标签: python pyspark databricks

我试图显示一个数据框,但是它总是以某种方式告诉我未定义df!怎么会这样?这是代码:

for key, val in mapping_dict.items():
    target_table = key
    files, query, schema = val
    for file in files:
      try:
        df = sqlContext.read.format('csv').options(header='true', charset='UTF-16').schema(schema).load(file)
        #Convert column names to lowercases and replace spaces with underscores.
        df = df.toDF(*[(c.lower()).replace(' ','_') for c in df.columns])
        #Convert strings to date type.
        df = df.withColumn("date", to_date(df['date']))
        df.registerTempTable("dataTable")
        df = sqlContext.sql(query)
        )
      except Exception as e:
        print(e)
  return print("The loading is completed!")

df.head()

The error is NameError: name 'df' is not defined

1 个答案:

答案 0 :(得分:0)

这是一个范围界定问题-您应该学习有关代码开发的最佳做法,或者请别人帮助您构建代码。

一种快速而又肮脏的解决方案(如果这是一个一次性脚本),是将global df放在函数顶部

def your_function(...):
    global df

    for key, val in mapping_dict.items():
        target_table = key
        files, query, schema = val
        for file in files:
    ...

df.head()