Question

我尝试运行spark-ml的例子，但是

from pyspark import SparkContext
import pyspark.sql 

sc = SparkContext(appName="PythonStreamingQueueStream")    
training = sqlContext.createDataFrame([
(1.0, Vectors.dense([0.0, 1.1, 0.1])),
(0.0, Vectors.dense([2.0, 1.0, -1.0])),
(0.0, Vectors.dense([2.0, 1.3, 1.0])),
(1.0, Vectors.dense([0.0, 1.2, -0.5]))], ["label", "features"])

无法运行，因为终端告诉我

NameError: name 'SQLContext' is not defined

为什么会这样？我该如何解决呢？

Answer 1

如果您正在使用Apache Spark 1.x系列（即在Apache Spark 2.0之前），要访问sqlContext，您需要导入sqlContext;即

from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

如果您使用的是Apache Spark 2.0，则可以直接使用Spark Session。因此，您的代码将是

training = spark.createDataFrame(...)

有关详细信息，请参阅Spark SQL Programing Guide。

为什么火花告诉我“name＆＃39; sqlContext＆＃39;没有定义“，我怎么能使用sqlContext？

1 个答案: