AttributeError:' DataFrame'对象没有属性' get'在VectorAssembler火花ML上

时间:2016-05-04 22:45:35

标签: python apache-spark pyspark apache-zeppelin

我试图按照here讨论的例子,我只是将代码复制到Zeppelin段落中。

%pyspark
import pandas as pd
from pyspark.sql import SQLContext
from pyspark.ml.feature import VectorAssembler
from pyspark.mllib.linalg import Vectors

dataset = sqlContext.createDataFrame(
[(0, 18, 1.0, Vectors.dense([0.0, 10.0, 0.5]), 1.0)],
["id", "hour", "mobile", "userFeatures", "clicked"])
print(type(dataset))
assembler = VectorAssembler(
inputCols=["hour", "mobile", "userFeatures"],
outputCol="features")
output = assembler.transform(dataset)

但是,我收到了这个错误:

Traceback (most recent call last): 
  File "/tmp/zeppelin_pyspark.py", line 164, in <module> 
    intp.setStatementsFinished(output.get(), False) 
  File "/home/zeppelin/zeppelin-0.5.5-incubating-bin-all/interpreter/spark/pyspark/pyspark.zip/pyspark/sql/dataframe.py", line 749, in __getattr__ "'%s' object has no attribute '%s'" % (self.__class__.__name__, name)) 
AttributeError: 'DataFrame' object has no attribute 'get'

有什么建议吗?

1 个答案:

答案 0 :(得分:0)

您可以尝试更改

from pyspark.mllib.linalg import Vectors

from pyspark.ml.linalg import Vectors