我尝试在pyspark中使用数据框运行线性回归,但在我尝试使用函数制作字段,标签之后,它仍然给我一个错误。有人可以帮我弄清楚如何使用数据框运行线性回归吗?
import pyspark.mllib
import pyspark.mllib.regression
from pyspark.mllib.regression import LabeledPoint
from pyspark.sql.functions import *
from pyspark.sql import Row
from pyspark.ml.linalg import Vectors
#from pyspark.ml.regression import LinearRegression
我的数据看起来像,
df_all_shorted.head(2)
[Row(bonica_rid=u'cand1457', party=100, vote_date=u'2001-01-03', vote_choice=6, vs_idealPoint=-0.514169271337908, vs_cuttingpoint=-0.514169271337908, vs_rcdir=1, fecyear_new=u'1992', Cand_ID_new=u'H2MA11060', state_new=u'MA', recipient_cfscore_new=-0.758, num_givers_total_new=1533, cand_gender_new=u'M', total_receipts_new=169089.0, total_indiv_contrib_new=105870.0, total_pac_contribs_new=0.0, ran_primary_new=1, ran_general_new=1, district_partisanship_new=-0.119),
Row(bonica_rid=u'cand1457', party=100, vote_date=u'2001-01-03', vote_choice=6, vs_idealPoint=-0.514169271337908, vs_cuttingpoint=-0.514169271337908, vs_rcdir=1, fecyear_new=u'1992', Cand_ID_new=u'H2MA11060', state_new=u'MA', recipient_cfscore_new=-0.758, num_givers_total_new=1533, cand_gender_new=u'M', total_receipts_new=0.0, total_indiv_contrib_new=0.0, total_pac_contribs_new=0.0, ran_primary_new=0, ran_general_new=0, district_partisanship_new=-0.119)]
和
training = df_all_shorted.rdd.map(lambda line:LabeledPoint(line[0],[line[1:]])
我尝试了这段代码并收到错误,
AttributeError: 'DataFrame' object has no attribute 'map'
所以我改为
training = df_all_shorted.rdd.map(lambda line:LabeledPoint(line[0],[line[1:]]))
and it worked, but when I run
lr = LinearRegression()\
.setMaxIter(10)\
.setRegParam(0.3)\
.setElasticNetParam(0.8)
lrModel = lr.fit(training)
发生错误,
AttributeError: 'PipelinedRDD' object has no attribute '_jdf'
答案 0 :(得分:0)
您收到此错误的原因是您尝试使用的pyspark.ml
功能来自pyspark.mllib
而非LinearRegression
。在您注释掉行pyspark.ml
后,您的全局变量空间仍会识别from pyspark.ml.regression import LinearRegression
来自console.log('start');
myPromisse.then(() => {
console.log('Admin created');
}).catch((err) => {
console.error('An error occurred creating Admin: ', err);
});
模块。重新启动并再次运行它。