Question

我有一个小数据集，它的行数少于2000。我正在尝试使用LinearRegressionModel来匹配ML，数据集只有一个feature（我已经规范化了），在模型拟合后，我使用{{{ 1}}并衡量指标RegressionEvaluator和R2。然后我注意到错误很高，因此决定创建更多的人工特征，以便更好地描述这些现象。为了达到这个目的，我创建了以下RMSE（注意我检查它是否有效）。

UDF

在此之后，我修改了我的numberFeatures = 12 def addFeatures(value): v = value.toArray()[0] return Vectors.dense([v ** (1.0 / x) for x in xrange(2, 10)] + [v ** x for x in xrange(1, numberFeatures)]) addFeaturesUDF = udf(addFeatures, VectorUDT()) # Here I test it print(addFeatures(Vectors.dense(2))) # [1.0,0.666666666667,0.5,0.4,0.333333333333,0.285714285714,0.25,0.222222222222,2.0,4.0,8.0,16.0,32.0,64.0,128.0,256.0,512.0,1024.0,2048.0]以使用DataFrame添加更多功能，我可以将其显示出来。

addFeaturesUDF

并且有效，但是当我尝试适合模型时，它会显示TreeMap#public V put(K key, V value)。

dtBoosted = dt.withColumn("features", addFeaturesUDF(col("features")))
dtBoosted.show(5)
#+--------+-----+----------+--------------------+
#|    date|price|   feature|            features|
#+--------+-----+----------+--------------------+
#|733946.0| 9.92|[733946.0]|[0.0,0.0,0.0,0.0,...|
#|733948.0| 8.05|[733948.0]|[4.88997555012224...|
#|733949.0| 8.05|[733949.0]|[7.33496332518337...|
#|733950.0| 7.91|[733950.0]|[9.77995110024449...|
#|733951.0| 7.91|[733951.0]|[0.00122249388753...|
#+--------+-----+----------+--------------------+
# only showing top 5 rows

有什么问题？我究竟做错了什么？它使用了一个功能和一些其他功能！

拟合模型时出现AssertError

0 个答案: