有没有办法将PySpark数据帧保存为ARFF格式?

时间:2016-05-05 14:41:29

标签: python apache-spark pyspark

我创建了一个准备好的数据框,并使用VectorAssembler对其进行了转换,以便与ML lib一起使用:

from pyspark.ml import Pipeline
from pyspark.ml.feature import VectorAssembler, StringIndexer
from pyspark.ml.classification import DecisionTreeClassifier

target_index = StringIndexer(inputCol="target", outputCol="target_idx").fit(df)
assembler = VectorAssembler(
inputCols=[
    x for x in df.columns if x not in ['target', 'ident_1', 'id_l', 'target_idx']
    ],
outputCol='features'
)

cl = DecisionTreeClassifier(labelCol='target_idx', featuresCol='features')
pipe = Pipeline(stages=[target_index, assembler, cl])
model = pipe.fit(df_train)
df_transformed = model.stages[1]

现在我想将转换后的数据集写入ARFF文件。有没有办法编写已经由VectorAssembler转换为ARFF格式的PySpark数据帧?

0 个答案:

没有答案