运行可执行python代码时未定义sc

时间:2019-01-03 19:12:28

标签: pyspark

我正在spark Submit(Spark 2.3.0)中运行以下代码,并得到“ NameError:未定义名称'sc'”

    from pyspark.sql import SQLContext
    from pyspark.sql.functions import col, lit
     from pyspark.sql.types import *

    if __name__  == "__main__":
      sc=SparkContext()

      sqlContext = SQLContext(sc)
      forecast = sc.read.load('/user/gg/LV_hadoop_example.csv', 
      format='csv', header='true', inferSchema='true', sep=',')
      forecast = forecast.filter(forecast['Total_scaled_forecast'] > 0)
      forecast.saveAsTextFile("word_count11.txt")

1 个答案:

答案 0 :(得分:2)

在spark 2.3.0中,使用以下命令加载CSV文件的正确方法:

from pyspark.sql import SparkSession

# initiate spark instance
spark = SparkSession.builder
            .master("local")
            .appName("abc")
            .getOrCreate()

# read csv file
df = spark.read.csv('/user/gg/LV_hadoop_example.csv')

查看documentation以获得更多示例。