尝试将surrogateDF
的{{1}}属性中的值转换为python列表时出现此错误:
pyspark.ml.feature.ImputerModel
代码如下所示。奇怪的是,执行File "D:\repos\onnxmltools\onnxmltools\convert\sparkml\operator_converters\Imputer.py", line 21, in convert_imputer
surrogates = op.surrogateDF.toPandas().values[0].tolist()
File "C:\Users\jeff\AppData\Local\Continuum\anaconda3\envs\py3.6\lib\site-packages\pyspark\sql\dataframe.py", line 1968, in toPandas
pdf = pd.DataFrame.from_records(self.collect(), columns=self.columns)
File "C:\Users\jeff\AppData\Local\Continuum\anaconda3\envs\py3.6\lib\site-packages\pyspark\sql\dataframe.py", line 465, in collect
with SCCallSiteSync(self._sc) as css:
File "C:\Users\jeff\AppData\Local\Continuum\anaconda3\envs\py3.6\lib\site-packages\pyspark\traceback_utils.py", line 72, in __enter__
self._context._jsc.setCallSite(self._call_site)
AttributeError: 'NoneType' object has no attribute 'setCallSite'
实际上会打印正确的值。
model.surrogateDF.show()
从data = self.spark.createDataFrame([
(1.0, float("nan")),
(2.0, float("nan")),
(float("nan"), 3.0),
(4.0, 4.0),
(5.0, 5.0)
], ["a", "b"])
imputer = Imputer(inputCols=["a", "b"], outputCols=["out_a", "out_b"])
model = imputer.fit(data)
surrogates = model.surrogateDF.toPandas().values[0].tolist()
打印:
show()
我还尝试使用model.surrogateDF.show()
+---+---+
| a| b|
+---+---+
|3.0|4.0|
+---+---+
或RDD
来获得不同的值,这没什么区别。