Question

我正在尝试加载SVM文件并将其转换为DataFrame，因此我可以使用Spark中的ML模块（Pipeline ML）。我刚刚在Ubuntu 14.04上安装了新的Spark 1.5.0（未配置spark-env.sh）。

我的my_script.py是：

from pyspark.mllib.util import MLUtils
from pyspark import SparkContext

sc = SparkContext("local", "Teste Original")
data = MLUtils.loadLibSVMFile(sc, "/home/svm_capture").toDF()

我正在使用：./spark-submit my_script.py

我收到错误：

Traceback (most recent call last):
File "/home/fred-spark/spark-1.5.0-bin-hadoop2.6/pipeline_teste_original.py", line 34, in <module>
data = MLUtils.loadLibSVMFile(sc, "/home/fred-spark/svm_capture").toDF()
AttributeError: 'PipelinedRDD' object has no attribute 'toDF'

我能理解的是，如果我跑：

data = MLUtils.loadLibSVMFile(sc, "/home/svm_capture").toDF()

直接在PySpark shell中，它可以工作。

Answer 1

SQLContext方法是一个猴子补丁executed inside SparkSession (SQLContext constructor in 1.x) constructor所以为了能够使用它，你必须首先创建一个SparkSession（或# SQLContext or HiveContext in Spark 1.x from pyspark.sql import SparkSession from pyspark import SparkContext sc = SparkContext() rdd = sc.parallelize([("a", 1)]) hasattr(rdd, "toDF") ## False spark = SparkSession(sc) hasattr(rdd, "toDF") ## True rdd.toDF().show() ## +---+---+ ## | _1| _2| ## +---+---+ ## | a| 1| ## +---+---+）：

SQLContext

更不用说你需要$scope.$apply才能使用DataFrame。

Answer 2

确保您也有Spark会话。

                  Stack(children: <Widget>[
                    Container(
                      decoration: new BoxDecoration(
                        shape: BoxShape.rectangle,
                        border: new Border.all(
                        color: Colors.black,
                        width: 1 
                        )
                      ),
                      child: Text("League of legends")
                    ),
                    Positioned(
                      right: 0,
                      bottom: 0,
                      child: Container(
                        width: 9,
                        height: 9,
                        child: new Container(
                          width: 9,
                          height: 9,
                          decoration: new BoxDecoration(
                            color: Colors.white,
                            shape: BoxShape.circle,
                          ),
                        ),
                      )
                    )
                  ])

＆＃39; PipelinedRDD＆＃39;对象没有属性“toDF＆＃39;在PySpark中

2 个答案: