获取无在我将pypspark中的统计函数应用于RDD时出现类型错误

时间:2016-08-26 06:56:52

标签: pyspark rdd

filtered_data = filtered_data.map(lambda x: log(x.label))

Filtered_data是Labeled Point RDD,用作线性回归模型的输入。它由2个字段组成 - 标签和特征向量。

示例 - [LabeledPoint(1.0[0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0]), LabeledPoint(1.0[0.777778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0]), LabeledPoint(1.0, [0.95,0.162791,0.0,0.0,0.0,0.0,0.0,0.0,0.6976744186,0.0,0.0,0.0,0.0,0.0]),

这是我尝试执行的命令。它给出以下错误。     AttributeError:'NoneType'对象没有属性'_jvm'

at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166)
at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:207)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:125)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

当我尝试执行任何其他操作(例如加法或减法)时,它会成功执行。

0 个答案:

没有答案