我需要为ML创建一个Spark数据集。我有一个包含100个Double值的数组,我想将它们添加到100列的数据集中(每列一个值)。
我该怎么办? 谢谢
编辑:代码
import org.apache.spark.sql.Row
import org.apache.spark.sql.RowFactory
import sess.implicits._
val values = new ListBuffer[Double]()
//Values population proccess ....
val ds = values.toDS()
ds.show()
de输出显示为:
+--------+
| value|
+--------+
| 27242.0|
| 33883.0|
| 69727.0|
| 20851.0|
| 27740.0|
| 18747.0|
答案 0 :(得分:0)
有很多方法可以满足您的要求。其中一种方法是形成schema
,然后将 100个双打的array
转换为RDD[Seq[Row[Doubles]]]
,最后使用createDataFrame
api形成dataframe
。
// necessary imports
import scala.collection.mutable.ListBuffer
import org.apache.spark.sql.Row
import org.apache.spark.sql.types.{StructType, StructField}
import org.apache.spark.sql.SQLContext
// forming array of 100 doubles
var values = new ListBuffer[Double]()
for(x <- 1 to 100){
values = values :+ x.toDouble
}
//creating schema for the 100 doubles
val schema = StructType(values.map(value => StructField(("col"+value).replace(".", "_"), DoubleType, true)))
// finally creating the dataframe of 100 doubles with each values in each column
val df = sqlContext.createDataFrame(sc.parallelize(Seq(Row.fromSeq((values.toSeq)))), schema)
df.show(false)
应该给你
+------+------+------+------+------+------+------+------+------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+--------+
|col1_0|col2_0|col3_0|col4_0|col5_0|col6_0|col7_0|col8_0|col9_0|col10_0|col11_0|col12_0|col13_0|col14_0|col15_0|col16_0|col17_0|col18_0|col19_0|col20_0|col21_0|col22_0|col23_0|col24_0|col25_0|col26_0|col27_0|col28_0|col29_0|col30_0|col31_0|col32_0|col33_0|col34_0|col35_0|col36_0|col37_0|col38_0|col39_0|col40_0|col41_0|col42_0|col43_0|col44_0|col45_0|col46_0|col47_0|col48_0|col49_0|col50_0|col51_0|col52_0|col53_0|col54_0|col55_0|col56_0|col57_0|col58_0|col59_0|col60_0|col61_0|col62_0|col63_0|col64_0|col65_0|col66_0|col67_0|col68_0|col69_0|col70_0|col71_0|col72_0|col73_0|col74_0|col75_0|col76_0|col77_0|col78_0|col79_0|col80_0|col81_0|col82_0|col83_0|col84_0|col85_0|col86_0|col87_0|col88_0|col89_0|col90_0|col91_0|col92_0|col93_0|col94_0|col95_0|col96_0|col97_0|col98_0|col99_0|col100_0|
+------+------+------+------+------+------+------+------+------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+--------+
|1.0 |2.0 |3.0 |4.0 |5.0 |6.0 |7.0 |8.0 |9.0 |10.0 |11.0 |12.0 |13.0 |14.0 |15.0 |16.0 |17.0 |18.0 |19.0 |20.0 |21.0 |22.0 |23.0 |24.0 |25.0 |26.0 |27.0 |28.0 |29.0 |30.0 |31.0 |32.0 |33.0 |34.0 |35.0 |36.0 |37.0 |38.0 |39.0 |40.0 |41.0 |42.0 |43.0 |44.0 |45.0 |46.0 |47.0 |48.0 |49.0 |50.0 |51.0 |52.0 |53.0 |54.0 |55.0 |56.0 |57.0 |58.0 |59.0 |60.0 |61.0 |62.0 |63.0 |64.0 |65.0 |66.0 |67.0 |68.0 |69.0 |70.0 |71.0 |72.0 |73.0 |74.0 |75.0 |76.0 |77.0 |78.0 |79.0 |80.0 |81.0 |82.0 |83.0 |84.0 |85.0 |86.0 |87.0 |88.0 |89.0 |90.0 |91.0 |92.0 |93.0 |94.0 |95.0 |96.0 |97.0 |98.0 |99.0 |100.0 |
+------+------+------+------+------+------+------+------+------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+--------+