如何将spark数据帧中的所有列名称转换为Seq变量。
输入数据&模式
val dataset1 = Seq(("66", "a", "4"), ("67", "a", "0"), ("70", "b", "4"), ("71", "d", "4")).toDF("KEY1", "KEY2", "ID")
dataset1.printSchema()
root
|-- KEY1: string (nullable = true)
|-- KEY2: string (nullable = true)
|-- ID: string (nullable = true)
我需要使用scala编程将所有列名存储在变量中。我试过如下,但它没有用。
val selectColumns = dataset1.schema.fields.toSeq
selectColumns: Seq[org.apache.spark.sql.types.StructField] = WrappedArray(StructField(KEY1,StringType,true),StructField(KEY2,StringType,true),StructField(ID,StringType,true))
预期产出:
val selectColumns = Seq(
col("KEY1"),
col("KEY2"),
col("ID")
)
selectColumns: Seq[org.apache.spark.sql.Column] = List(KEY1, KEY2, ID)
答案 0 :(得分:9)
您可以使用以下命令:
val selectColumns = dataset1.columns.toSeq
scala> val dataset1 = Seq(("66", "a", "4"), ("67", "a", "0"), ("70", "b", "4"), ("71", "d", "4")).toDF("KEY1", "KEY2", "ID")
dataset1: org.apache.spark.sql.DataFrame = [KEY1: string, KEY2: string ... 1 more field]
scala> val selectColumns = dataset1.columns.toSeq
selectColumns: Seq[String] = WrappedArray(KEY1, KEY2, ID)
答案 1 :(得分:5)
val selectColumns = dataset1.columns.toList.map(col(_))
答案 2 :(得分:3)
我像这样使用columns属性
val cols = dataset1.columns.toSeq
然后,如果您稍后按顺序从头到尾选择所有列,则可以使用
val orderedDF = dataset1.select(cols.head, cols.tail:_ *)
答案 3 :(得分:1)
也可以从架构中获取列。
val dataset1 = Seq(("66", "a", "4"), ("67", "a", "0"), ("70", "b", "4"), ("71", "d", "4")).toDF("KEY1", "KEY2", "ID") dataset1.printSchema() root |-- KEY1: string (nullable = true) |-- KEY2: string (nullable = true) |-- ID: string (nullable = true) val selectColumns = dataset1.schema.fieldNames selectColumns: Array[String] = Array(KEY1, KEY2, ID) val selectColumns2 = dataset1.schema.fieldNames.toSeq selectColumns2: Seq[String] = WrappedArray(KEY1, KEY2, ID)
答案 4 :(得分:0)
我们可以通过以下方式将数据集/表的列名放入Sequence变量中。
来自数据集,
val col_seq:Seq[String] = dataset.columns.toSeq
从表中
val col_seq:Seq[String] = spark.table("tablename").columns.toSeq
or
val col_seq:Seq[String] = spark.catalog.listColumns("tablename").select('name).collect.map(col=>col.toString).toSeq