我已经创建了一个ALS模型,并将其方法称为.transform(test_data)
。我现在想查看为数据生成的预测。
userRecs.printSchema()
产生:
|-- ProductID: integer (nullable = false)
|-- recommendations: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- CustomerID: integer (nullable = true)
| | |-- rating: float (nullable = true)
调用userRecs.first()
将导致进程挂在“第4阶段”
[Stage 4:> (0 + 1) / 1]
我不正确地处理/读取数据吗?我也不确定为什么调用userRecs.first()
需要更多处理吗?
import pandas as pd
from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml.recommendation import ALS, ALSModel
from pyspark.ml.tuning import TrainValidationSplit, ParamGridBuilder
from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession
from pyspark.sql.functions import explode
sc = SparkContext('local')
spark = SparkSession(sc)
# load the model
data = pd.read_csv('matric-out-small-SMALL.csv', sep=',')
df = spark.createDataFrame(data)
(training, test) = df.randomSplit([0.8, 0.2]) # seed , 50
model = ALSModel.load("modelSaveOut")
# predict test ata
model.transform(test)
# Generate top 10 recommendations for each user
userRecs = model.recommendForAllUsers(3)
userRecs.printSchema()
userRecs.first()
此外,我想知道是否还有另一种方法可以使模型仅对单个数据点提供预测? 我相信有更好的解决方案来获得某个值的预测)?