我希望在线性回归实现后可视化散点图。因此,我想使用Python Pandas DataFrame(pydf)和ggplot创建一个线性回归图,以显示我使用的散点图和两个回归模型。
我有3个prédictors(cause,wt0,dbp0)和gfr0m作为预测值:
# Import numpy, pandas, and ggplot
import numpy as np
from pandas import *
from ggplot import *
# Create Python DataFrame
cause = prepared_data.map(lambda p: (p.features[0])).collect()
dbp0 = prepared_data.map(lambda p: (p.features[0])).collect()
gfr0m = prepared_data.map(lambda p: (p.label)).collect()
predA = predictionsA.select("prediction").map(lambda r: r[0]).collect()
predB = predictionsB.select("prediction").map(lambda r: r[0]).collect()
pydf =
DataFrame({'cause':cause,'wt0':wt0,'dbp0':dbp0,'gfr0m':gfr0m,'predA':predA,
'predB':predB})
# Create scatter plot and two regression models (scaling exponential) using
ggplot
p = ggplot(pydf, aes('cause','wt0','dbp0','gfr0m')) +
geom_point(color='blue') +
geom_line(pydf, aes('cause','wt0','dbp0','predA'), color='red') +
geom_line(pydf, aes('cause','wt0','dbp0','predB'), color='green') +
scale_x_log10() + scale_y_log10()
display(p)
这段代码不会执行它显示我:'DataFrame'对象没有属性'map'当我添加.rdd before.map时它将我显示为错误:org.apache.spark.SparkException:作业由于阶段失败而中止:阶段34.0中的任务0失败1次,最近失败:阶段34.0中丢失任务0.0(TID 34,localhost,执行程序驱动程序):org.apache.spark.api.python.PythonException:Traceback(最近一次调用最后一次): 如果有人可以在这种情况下帮助我:)