filtered_data = filtered_data.map(lambda x: log(x.label))
Filtered_data是Labeled Point RDD,用作线性回归模型的输入。它由2个字段组成 - 标签和特征向量。
示例 - [LabeledPoint(1.0[0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0]),
LabeledPoint(1.0[0.777778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0]),
LabeledPoint(1.0, [0.95,0.162791,0.0,0.0,0.0,0.0,0.0,0.0,0.6976744186,0.0,0.0,0.0,0.0,0.0]),
这是我尝试执行的命令。它给出以下错误。 AttributeError:'NoneType'对象没有属性'_jvm'
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166)
at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:207)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:125)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
当我尝试执行任何其他操作(例如加法或减法)时,它会成功执行。