如何将元组列表转换为scala中的dataframe

时间:2016-12-19 18:13:57

标签: scala apache-spark dataframe apache-spark-sql

我有一个字符串元组列表:List[(String, String, String)]

如何使用Scala将其转换为数据框?

2 个答案:

答案 0 :(得分:4)

您创建SparkSession(从Spark 2.0.0开始)或SQLContext,然后您可以使用隐式toDF()

Spark 1.6或更早版本:

val sc = new SparkContext("local", "test")
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._

val df: DataFrame = list.toDF() // with default column names: _1, _2, _3
val dfWithColNames: DataFrame = list.toDF("col1", "col2", "col3")

Spark 2.0.0或更新版

val sparkSession: SparkSession = SparkSession.builder().appName("test").master("local").getOrCreate()
import sparkSession.implicits._

val df: DataFrame = list.toDF() // with default column names: _1, _2, _3
val dfWithColNames: DataFrame = list.toDF("col1", "col2", "col3")

答案 1 :(得分:1)

您可以使用toDF方法:

scala> val myList = List(("a1", "a2", "a3"), ("b1", "b2", "b3"), ("c1", "c2", "c3"))
myList: List[(String, String, String)] = List((a1,a2,a3), (b1,b2,b3), (c1,c2,c3))

scala> myList.toDF("col1", "col2", "col3").show
+----+----+----+
|col1|col2|col3|
+----+----+----+
|  a1|  a2|  a3|
|  b1|  b2|  b3|
|  c1|  c2|  c3|
+----+----+----+

根据您的配置,您可能需要运行import sqlContext.implicits._