我需要使用Java API将向量汇编程序的输出元素作为单独的列。
VectorAssembler assembler3 = new VectorAssembler()
.setInputCols(new String[]{"res1", "res2"})
.setOutputCol("res3");
DataFrame output = assembler1.transform(sensordataDF);
res1和res2都是双数组矢量。任何人都可以指导我如何做到这一点吗?
答案 0 :(得分:1)
The output dataframe will be sensordataDF with a new column called res3, but also it will still have columns res1 and res2.
Edit: Maybe could be done using spark.sql.functions split and casting the column to string, and then while separating, casting back to doubletype.
I use spark with python, but in java should be nearly the same
Example:
split_col = split(output['res3'], ',')
df = ouput.withColumn('first_data', split_col.getItem(0))
df = df.withColumn('second_data', split_col.getItem(1))