来自List的Spark数据集

时间:2018-02-25 12:03:42

标签: arrays apache-spark machine-learning dataset double

我需要为ML创建一个Spark数据集。我有一个包含100个Double值的数组,我想将它们添加到100列的数据集中(每列一个值)。

我该怎么办? 谢谢

编辑:代码

import org.apache.spark.sql.Row
import org.apache.spark.sql.RowFactory


import sess.implicits._

val values = new ListBuffer[Double]()

//Values population proccess ....


val ds = values.toDS()

ds.show()

de输出显示为:

+--------+
|   value|
+--------+
| 27242.0|
| 33883.0|
| 69727.0|
| 20851.0|
| 27740.0|
| 18747.0|

1 个答案:

答案 0 :(得分:0)

有很多方法可以满足您的要求。其中一种方法是形成schema,然后将 100个双打array转换为RDD[Seq[Row[Doubles]]],最后使用createDataFrame api形成dataframe

// necessary imports
import scala.collection.mutable.ListBuffer
import org.apache.spark.sql.Row
import org.apache.spark.sql.types.{StructType, StructField}
import org.apache.spark.sql.SQLContext

// forming array of 100 doubles
var values = new ListBuffer[Double]()
for(x <- 1 to 100){
  values = values :+ x.toDouble
}

//creating schema for the 100 doubles
val schema = StructType(values.map(value => StructField(("col"+value).replace(".", "_"), DoubleType, true)))

// finally creating the dataframe of 100 doubles with each values in each column
val df = sqlContext.createDataFrame(sc.parallelize(Seq(Row.fromSeq((values.toSeq)))), schema)
df.show(false)

应该给你

+------+------+------+------+------+------+------+------+------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+--------+
|col1_0|col2_0|col3_0|col4_0|col5_0|col6_0|col7_0|col8_0|col9_0|col10_0|col11_0|col12_0|col13_0|col14_0|col15_0|col16_0|col17_0|col18_0|col19_0|col20_0|col21_0|col22_0|col23_0|col24_0|col25_0|col26_0|col27_0|col28_0|col29_0|col30_0|col31_0|col32_0|col33_0|col34_0|col35_0|col36_0|col37_0|col38_0|col39_0|col40_0|col41_0|col42_0|col43_0|col44_0|col45_0|col46_0|col47_0|col48_0|col49_0|col50_0|col51_0|col52_0|col53_0|col54_0|col55_0|col56_0|col57_0|col58_0|col59_0|col60_0|col61_0|col62_0|col63_0|col64_0|col65_0|col66_0|col67_0|col68_0|col69_0|col70_0|col71_0|col72_0|col73_0|col74_0|col75_0|col76_0|col77_0|col78_0|col79_0|col80_0|col81_0|col82_0|col83_0|col84_0|col85_0|col86_0|col87_0|col88_0|col89_0|col90_0|col91_0|col92_0|col93_0|col94_0|col95_0|col96_0|col97_0|col98_0|col99_0|col100_0|
+------+------+------+------+------+------+------+------+------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+--------+
|1.0   |2.0   |3.0   |4.0   |5.0   |6.0   |7.0   |8.0   |9.0   |10.0   |11.0   |12.0   |13.0   |14.0   |15.0   |16.0   |17.0   |18.0   |19.0   |20.0   |21.0   |22.0   |23.0   |24.0   |25.0   |26.0   |27.0   |28.0   |29.0   |30.0   |31.0   |32.0   |33.0   |34.0   |35.0   |36.0   |37.0   |38.0   |39.0   |40.0   |41.0   |42.0   |43.0   |44.0   |45.0   |46.0   |47.0   |48.0   |49.0   |50.0   |51.0   |52.0   |53.0   |54.0   |55.0   |56.0   |57.0   |58.0   |59.0   |60.0   |61.0   |62.0   |63.0   |64.0   |65.0   |66.0   |67.0   |68.0   |69.0   |70.0   |71.0   |72.0   |73.0   |74.0   |75.0   |76.0   |77.0   |78.0   |79.0   |80.0   |81.0   |82.0   |83.0   |84.0   |85.0   |86.0   |87.0   |88.0   |89.0   |90.0   |91.0   |92.0   |93.0   |94.0   |95.0   |96.0   |97.0   |98.0   |99.0   |100.0   |
+------+------+------+------+------+------+------+------+------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+--------+