有没有办法用新的spark.ml API预测单个载体?我想在map()中执行此操作以避免在flatMap()之后调用groupByKey():
当前代码(pyspark):
% Given 'model', 'rdd', and a function 'split_element' that splits an
% element of the RDD into a list of elements (and assuming each element
% has a value and a key so that groupByKey will work to merge them later)
split_rdd = rdd.flatMap(split_element)
split_results = model.transform(split_rdd.toDF()).rdd
return split_results.groupByKey()
所需代码:
split_rdd = rdd.map(split_element)
split_results = split_rdd.map(lambda elem_list: [model.transformOne(elem) for elem in elem_list])
return split_results