PySpark:将SparseVector转换为Array SQL类型会引发net.razorvine.pickle.PickleException

时间:2019-01-30 05:49:21

标签: pyspark pyspark-sql

pyspark:2.3.2

从Spark示例创建数据框:

$.ajax({
         url: "file.php",
         type: "POST",
         async: true,
         success: function(data) {
                // write something awesome in response data part
         }
      });

有一个“功能”列,其中包含input_path = os.path.join(this_script_dir, "data", "sample_libsvm_data.txt") training_data = self.spark.read.format("libsvm").load(input_path) 。如下图所示:

SparseVector

我正在使用以下方法进行转换:

<class 'list'>: [StructField(features,VectorUDT,true)]

转换后,架构显示为:

spark.udf.register("sparseToArray", lambda x: numpy.array(x.toArray()), ArrayType(elementType=FloatType(), containsNull=False))
sql = "sparseToArray(features) as features"
data = training_data.selectExpr(sql)

呼叫StructField(features,ArrayType(FloatType,false),true) 会导致此问题:

data.collect()

0 个答案:

没有答案