我想将包含Double值的数据框转换为List,以便可以使用它进行计算。您的建议是什么,以便我可以选择正确的列表类型(即Double)?
我的方法是这样
var newList = myDataFrame.collect().toList
但是它返回类型 List [org.apache.spark.sql.Row] ,我不知道它到底是什么!
是否有可能忘记这一步,而只是将我的Dataframe传递给函数并从中进行计算? (例如,我想将第二列的第三个元素与特定的double进行比较。是否可以直接从我的数据框进行比较?)
不惜一切代价,我必须每次都了解如何创建正确的类型列表!
编辑:
输入数据框:
+---+---+
|_c1|_c2|
+---+---+
|0 |0 |
|8 |2 |
|9 |1 |
|2 |9 |
|2 |4 |
|4 |6 |
|3 |5 |
|5 |3 |
|5 |9 |
|0 |1 |
|8 |9 |
|1 |0 |
|3 |4 |
|8 |7 |
|4 |9 |
|2 |5 |
|1 |9 |
|3 |6 |
+---+---+
转换后的结果:
List((0,0), (8,2), (9,1), (2,9), (2,4), (4,6), (3,5), (5,3), (5,9), (0,1), (8,9), (1,0), (3,4), (8,7), (4,9), (2,5), (1,9), (3,6))
但是列表中的每个元素都必须是Double类型。
答案 0 :(得分:2)
您可以将所需的电量转换为Double
并将其转换为RDD并collect
如果您有无法解析的数据,则可以在将数据转换为double之前使用udf进行清理
val stringToDouble = udf((data: String) => {
Try (data.toDouble) match {
case Success(value) => value
case Failure(exception) => Double.NaN
}
})
val df = Seq(
("0.000","0"),
("0.000008","24"),
("9.00000","1"),
("-2","xyz"),
("2adsfas","1.1.1")
).toDF("a", "b")
.withColumn("a", stringToDouble($"a").cast(DoubleType))
.withColumn("b", stringToDouble($"b").cast(DoubleType))
此后,您将输出为
+------+----+
|a |b |
+------+----+
|0.0 |0.0 |
|8.0E-6|24.0|
|9.0 |1.0 |
|-2.0 |NaN |
|NaN |NaN |
+------+----+
获取Array[(Double, Double)]
val result = df.rdd.map(row => (row.getDouble(0), row.getDouble(1))).collect()
结果将为Array[(Double, Double)]
答案 1 :(得分:0)
#Convert DataFrame to DataSet using case class & then convert it to list
#It'll return the list of type of your class object.All the variables inside the #class(mapping to fields in your table)will be pre-typeCasted) Then you won't need to #type cast every time.
#Please execute below code to check it-
#Sample to check & verify(scala)-
val wa = Array("one","two","two")
val wr = sc.parallelize(wa,3).map(x=>(x,"x",1))
val wdf = wr.toDF("a","b","c")
case class wc(a:String,b:String,c:Int)
val myList= wds.collect.toList
myList.foreach(x=>println(x))
myList.foreach(x=>println(x.a.getClass,x.b.getClass,x.c.getClass))
答案 2 :(得分:-1)
myDataFrame.select("_c1", "_c2").collect().map(each => (each.getAs[Double]("_c1"), each.getAs[Double]("_c2"))).toList